## CS 345S: Data-intensive Systems for the Next 1000x

The last decade saw enormous shifts in the design of large-scale data-intensive systems due to the rise of Internet services, cloud computing, and Big Data processing. Where will we see the next 1000x increases in scale and data volume, and how should data-intensive systems accordingly evolve? This course will critically examine a range of trends, including the Internet of Things, drones, smart cities, and emerging hardware capabilities, through the lens of software systems research and design. Students will perform a comparative analysis by reading and discussing cutting-edge research while performing their own original research. Prerequisites: Strong background in software systems, especially databases (
CS 245) and distributed systems (
CS 244B), and/or machine learning (
CS 229). Undergraduates who have completed
CS 245 are strongly encouraged to attend.

Last offered: Autumn 2016

## EARTHSYS 162: Data for Sustainable Development (CS 325B, EARTHSYS 262)

The sustainable development goals (SDGs) encompass many important aspects of human and ecosystem well-being that are traditionally difficult to measure. This project-based course will focus on ways to use inexpensive, unconventional data streams to measure outcomes relevant to SDGs, including poverty, hunger, health, governance, and economic activity. Students will apply machine learning techniques to various projects outlined at the beginning of the quarter. The main learning goals are to gain experience conducting and communicating original research. Prior knowledge of machine learning techniques, such as from
CS 221,
CS 229,
CS 231N,
STATS 202, or
STATS 216 is required. Open to both undergraduate and graduate students. Enrollment limited to 24. Students must apply for the class by filling out the form at
https://goo.gl/forms/9LSZF7lPkHadix5D3. A permission code will be given to admitted students to register for the class.

Terms: Aut, Win
| Units: 3-5
| Repeatable for credit

Instructors:
Burke, M. (PI)
;
Ermon, S. (PI)
;
Lobell, D. (PI)
...
more instructors for EARTHSYS 162 »

Instructors:
Burke, M. (PI)
;
Ermon, S. (PI)
;
Lobell, D. (PI)
;
Perez, A. (PI)
;
Perez, A. (TA)
;
Yeh, C. (TA)

## EARTHSYS 262: Data for Sustainable Development (CS 325B, EARTHSYS 162)

The sustainable development goals (SDGs) encompass many important aspects of human and ecosystem well-being that are traditionally difficult to measure. This project-based course will focus on ways to use inexpensive, unconventional data streams to measure outcomes relevant to SDGs, including poverty, hunger, health, governance, and economic activity. Students will apply machine learning techniques to various projects outlined at the beginning of the quarter. The main learning goals are to gain experience conducting and communicating original research. Prior knowledge of machine learning techniques, such as from
CS 221,
CS 229,
CS 231N,
STATS 202, or
STATS 216 is required. Open to both undergraduate and graduate students. Enrollment limited to 24. Students must apply for the class by filling out the form at
https://goo.gl/forms/9LSZF7lPkHadix5D3. A permission code will be given to admitted students to register for the class.

Terms: Aut, Win
| Units: 3-5
| Repeatable for credit

Instructors:
Burke, M. (PI)
;
Ermon, S. (PI)
;
Lobell, D. (PI)
...
more instructors for EARTHSYS 262 »

Instructors:
Burke, M. (PI)
;
Ermon, S. (PI)
;
Lobell, D. (PI)
;
Perez, A. (PI)
;
Perez, A. (TA)
;
Yeh, C. (TA)

## GENE 236: Deep Learning in Genomics and Biomedicine (BIODS 237, BIOMEDIN 273B, CS 273B)

Recent breakthroughs in high-throughput genomic and biomedical data are transforming biological sciences into "big data" disciplines. In parallel, progress in deep neural networks are revolutionizing fields such as image recognition, natural language processing and, more broadly, AI. This course explores the exciting intersection between these two advances. The course will start with an introduction to deep learning and overview the relevant background in genomics and high-throughput biotechnology, focusing on the available data and their relevance. It will then cover the ongoing developments in deep learning (supervised, unsupervised and generative models) with the focus on the applications of these methods to biomedical data, which are beginning to produced dramatic results. In addition to predictive modeling, the course emphasizes how to visualize and extract interpretable, biological insights from such models. Recent papers from the literature will be presented and discussed. Students will be introduced to and work with popular deep learning software frameworks. Students will work in groups on a final class project using real world datasets. Prerequisites: College calculus, linear algebra, basic probability and statistics such as
CS 109, and basic machine learning such as
CS 229. No prior knowledge of genomics is necessary.

Terms: Aut
| Units: 3

Instructors:
Kundaje, A. (PI)
;
Zou, J. (PI)
;
Ghorbani, A. (TA)
...
more instructors for GENE 236 »

Instructors:
Kundaje, A. (PI)
;
Zou, J. (PI)
;
Ghorbani, A. (TA)
;
Liu, R. (TA)
;
Palamuttam, R. (TA)

## MS&E 234: Data Privacy and Ethics

This course engages with difficult ethical challenges in the modern practice of data science. The three main focuses are data privacy, personalization and targeting algorithms, and online experimentation. The focus on privacy will raise both practical and theoretical considerations. As part of the module on experimentation, students will be required to complete the Stanford IRB training for social and behavioral research. The course will assume a strong familiarity with the practice of machine learning and and data science. Recommended: MS&E 226, MS&E 231,
CS 229, or equivalents.

Terms: Spr
| Units: 3

Instructors:
Ugander, J. (PI)
;
Arrieta Ibarra, I. (TA)

## STATS 231: Statistical Learning Theory (CS 229T)

How do we formalize what it means for an algorithm to learn from data? This course focuses on developing mathematical tools for answering this question. We will present various common learning algorithms and prove theoretical guarantees about them. Topics include classical asymptotics, method of moments, generalization bounds via uniform convergence, kernel methods, online learning, and multi-armed bandits. Prerequisites: A solid background in linear algebra and probability theory, statistics and machine learning (
STATS 315A or
CS 229). Convex optimization (
EE 364A) is helpful but not required.

Last offered: Spring 2017

Filter Results: