Print Settings

CS 246: Mining Massive Data Sets

Availability of massive datasets is revolutionizing science and industry. This course discusses data mining and machine learning algorithms for analyzing very large amounts of data. Topics include: Big data systems (Hadoop, Spark); Link Analysis (PageRank, spam detection); Similarity search (locality-sensitive hashing, shingling, minhashing, random hyperplanes); Stream data processing; Analysis of social-network graphs; Association rules; Dimensionality reduction (UV, SVD, and CUR decompositions); Algorithms for very-large-scale mining (clustering, nearest-neighbor search); Large-scale machine learning (gradient descent, decision tree ensembles); Multi-armed bandit; Computational advertising. We also offer a sister class CS246H (Hadoop Labs) and a follow-up project-based class CS341 (Project in Mining Massive Datasets). Prerequisites: At least one of CS107 or CS145.
Terms: Win | Units: 3-4 | Grading: Letter or Credit/No Credit
Instructors: ; Leskovec, J. (PI)

CS 341: Project in Mining Massive Data Sets

Students work in teams of three to solve a problem involving the analysis of a massive dataset. A proposal, early in March is required. There will be an information session (announced in CS246) explaining the datasets available in early March and this information will also be on the CS341 course website in late February. Each accepted team will be assigned a mentor who will work with them regularly throughout the quarter. Teams will also be provided access to significant computing resources on a commercial public cloud.
Terms: Spr | Units: 3 | Grading: Letter or Credit/No Credit
© Stanford University | Terms of Use | Copyright Complaints