CS 246: Mining Massive Data Sets
The course will discuss data mining and machine learning algorithms for analyzing very large amounts of data. The emphasis will be on Map Reduce as a tool for creating parallel algorithms that can process very large amounts of data. Topics include: Frequent itemsets and Association rules, Near Neighbor Search in High Dimensional Data, Locality Sensitive Hashing (LSH), Dimensionality reduction, Recommender Systems, Clustering, Link Analysis, Large-scale machine learning, Data streams, Analysis of Social-network Graphs, and Web Advertising. Prerequisites: At lease one of CS107 or
CS145; At least one of CS109 or STAT116, or equivalent.
Terms: Win
| Units: 3-4
Instructors:
Leskovec, J. (PI)
;
Ullman, J. (PI)
CS 246H: Mining Massive Data Sets Hadoop Lab
Supplement to
CS 246 providing additional material on Hadoop. Students will learn how to implement data mining algorithms using Hadoop, how to implement and debug complex MapReduce jobs in Hadoop, and how to use some of the tools in the Hadoop ecosystem for data mining and machine learning. Topics: Hadoop, MapReduce, HDFS, combiners, secondary sort, distributed cache, SQL on Hadoop, Hive, Cloudera ML/Oryx, Mahout, Hadoop streaming, implementing Hadoop jobs, debugging Hadoop jobs, TF-IDF, Pig, Sqoop, Oozie, HBase, Impala. Prerequisite:
CS 107 or equivalent.
Terms: Win
| Units: 1
Instructors:
Templeton, D. (PI)
CS 341: Project in Mining Massive Data Sets
Team project in data-mining of very large-scale data, including the problem statement and implementation and evaluation of a solution. Teams consist of three students each, and they will meet regularly with a "coach" chosen from participating staff. Early lectures will cover the use of Amazon EC2 and certain systems like Hadoop and Hive. Occasional lectures thereafter will feature outside speakers, special topics of interest, and progress reports by the teams. Prerequisite:
CS 246.
Terms: Spr
| Units: 3
Filter Results: