Print Settings
 

STATS 32: Introduction to R for Undergraduates

This short course runs for weeks one through five of the quarter. It is recommended for undergraduate students who want to use R in the humanities or social sciences and for students who want to learn the basics of R programming. The goal of the short course is to familiarize students with R's tools for data analysis. Lectures will be interactive with a focus on learning by example, and assignments will be application-driven. No prior programming experience is needed. Topics covered include basic data structures, File I/O, data transformation and visualization, simple statistical tests, etc, and some useful packages in R. Prerequisite: undergraduate student. Priority given to non-engineering students. Laptops necessary for use in class.
Terms: Aut, Spr | Units: 1

STATS 48N: Riding the Data Wave (BIODS 48N)

Imagine collecting a bit of your saliva and sending it in to one of the personalized genomics company: for very little money you will get back information about hundreds of thousands of variable sites in your genome. Records of exposure to a variety of chemicals in the areas you have lived are only a few clicks away on the web; as are thousands of studies and informal reports on the effects of different diets, to which you can compare your own. What does this all mean for you? Never before in history humans have recorded so much information about themselves and the world that surrounds them. Nor has this data been so readily available to the lay person. Expression as "data deluge'' are used to describe such wealth as well as the loss of proper bearings that it often generates. How to summarize all this information in a useful way? How to boil down millions of numbers to just a meaningful few? How to convey the gist of the story in a picture without misleading oversimplifications? To answer these questions we need to consider the use of the data, appreciate the diversity that they represent, and understand how people instinctively interpret numbers and pictures. During each week, we will consider a different data set to be summarized with a different goal. We will review analysis of similar problems carried out in the past and explore if and how the same tools can be useful today. We will pay attention to contemporary media (newspapers, blogs, etc.) to identify settings similar to the ones we are examining and critique the displays and summaries there documented. Taking an experimental approach, we will evaluate the effectiveness of different data summaries in conveying the desired information by testing them on subsets of the enrolled students.
Last offered: Autumn 2020 | Units: 3 | UG Reqs: WAY-AQR, WAY-FR

STATS 60: Introduction to Statistical Methods: Precalculus (PSYCH 10, STATS 160)

Techniques for organizing data, computing, and interpreting measures of central tendency, variability, and association. Estimation, confidence intervals, tests of hypotheses, t-tests, correlation, and regression. Possible topics: analysis of variance and chi-square tests, computer statistical packages.
Terms: Aut, Win, Spr, Sum | Units: 5 | UG Reqs: GER:DB-Math, WAY-AQR, WAY-FR

STATS 100: Mathematics of Sports

This course will teach you how statistics and probability can be applied in sports, in order to evaluate team and individual performance, build optimal in-game strategies and ensure fairness between participants. Topics will include examples drawn from multiple sports such as basketball, baseball, soccer, football and tennis. The course is intended to focus on data-based applications, and will involve computations in R with real data sets via tutorial sessions and homework assignments. Prereqs: No statistical or programming background is assumed, but introductory courses, e.g, Stats 60,101 or 116, are recommended. A prior knowledge of Linear Algebra (e.g., Math 51) and basic probability is strongly recommended.
Terms: Win | Units: 3 | UG Reqs: GER:DB-Math, WAY-AQR

STATS 101: Data Science 101

This course will provide a hands-on introduction to statistics and data science. Students will engage with fundamental ideas in inferential and computational thinking. Each week consists of three lectures and two labs, in which students will manipulate real-world data and learn about statistical and computational tools. Topics covered include introductions to data visualization techniques, summary statistics, regression, prediction, sampling variability, statistical testing, inference, and replicability. The objectives of this course are to have students (1) be able to connect data to underlying phenomena and think critically about conclusions drawn from data analysis, and (2) be knowledgeable about how to carry out their own data analysis later. Some statistical background or programming experience is helpful, but not required. The class will start with a brief introduction to R but will move at a relatively fast pace. Freshmen and sophomores interested in data science, computing, and statistics are encouraged to attend. Also open to graduate students.
Last offered: Summer 2023 | Units: 5 | UG Reqs: GER: DB-NatSci, WAY-AQR

STATS 110: Statistical Methods in Engineering and the Physical Sciences

Introduction to statistics for engineers and physical scientists. Topics: descriptive statistics, probability, interval estimation, tests of hypotheses, nonparametric methods, linear regression, analysis of variance, elementary experimental design. Prerequisite: one year of calculus. Please note that students must enroll in one section in addition to the main lecture.
Terms: Aut | Units: 5 | UG Reqs: GER:DB-Math, WAY-AQR, WAY-FR

STATS 116: Theory of Probability

Probability spaces as models for phenomena with statistical regularity. Discrete spaces (binomial, hypergeometric, Poisson). Continuous spaces (normal, exponential) and densities. Random variables, expectation, independence, conditional probability. Introduction to the laws of large numbers and central limit theorem. Prerequisites: MATH 52 and familiarity with infinite series, or equivalent. Undergraduate students enroll for 5 units, graduate students enroll for 4 units. Undergraduate students must enroll in one section in addition to the main lecture. Sections are optional for graduate students. Note: Autumn 2023-24 is the last time this course will be offered. It will be replaced by STATS 117 and STATS 118 in 2024-25.
Terms: Aut | Units: 4-5 | UG Reqs: GER:DB-Math, WAY-AQR, WAY-FR

STATS 117: Theory of Probability I

Introduction to probability theory, including probability axioms, conditional probability, independence, random variables, and expectation. Joint, marginal, and conditional distributions. Discrete models (binomial, hypergeometric, Poisson) and continuous models (normal, exponential). Prerequisites: Single-variable calculus including infinite series (e.g., MATH 21) and at least one MATH course at Stanford. May not be taken for credit by students with credit in STATS 116, CS 109, MATH 151, or MS&E 120.
Terms: Spr, Sum | Units: 3

STATS 118: Theory of Probability II

Continuation of STATS 117, with a focus on probability topics useful for statistics. Sampling distributions of sums, means, variances, and order statistics of random variables. Convolutions, moment generating functions, and limit theorems. Probability distributions useful in statistics (gamma, beta, chi-square, t, multivariate normal). Prerequisites: a calculus-based first course in probability (such as STATS 117, CS 109, or MS&E 120) and multivariable calculus, including multiple integrals (MATH 52 or equivalent, can be taken concurrently). May not be taken for credit by students with credit in STATS 116.
Terms: Sum | Units: 4
Instructors: ; Hwang, J. (PI)

STATS 141: Biostatistics (BIO 141)

Introductory statistical methods for biological data: describing data (numerical and graphical summaries); introduction to probability; and statistical inference (hypothesis tests and confidence intervals). Intermediate statistical methods: comparing groups (analysis of variance); analyzing associations (linear and logistic regression); and methods for categorical data (contingency tables and odds ratio). Course content integrated with statistical computing in R.
Terms: Win | Units: 5 | UG Reqs: GER:DB-Math, WAY-AQR

STATS 160: Introduction to Statistical Methods: Precalculus (PSYCH 10, STATS 60)

Techniques for organizing data, computing, and interpreting measures of central tendency, variability, and association. Estimation, confidence intervals, tests of hypotheses, t-tests, correlation, and regression. Possible topics: analysis of variance and chi-square tests, computer statistical packages.
Terms: Aut, Win, Spr, Sum | Units: 5

STATS 191: Introduction to Applied Statistics

Statistical tools for modern data analysis. Topics include regression and prediction, elements of the analysis of variance, bootstrap, and cross-validation. Emphasis is on conceptual rather than theoretical understanding. Applications to social/biological sciences. Student assignments/projects require use of the software package R. Prerequisite: introductory statistical methods course. Recommended: 60, 110, or 141.
Terms: Spr, Sum | Units: 3 | UG Reqs: GER:DB-Math, WAY-AQR

STATS 195: Introduction to R

This short course runs for four weeks. It is recommended for students who want to use R in statistics, science or engineering courses, and for students who want to learn the basics of data science with R. The goal of the short course is to familiarize students with some of the most important R tools for data analysis. Lectures will focus on learning by example and assignments will be application-driven. No prior programming experience is assumed.
Terms: Win | Units: 1
Instructors: ; Zhang, I. (PI)

STATS 196A: Multilevel Modeling Using R (EDUC 401D)

See http://rogosateaching.com/stat196/ . Multilevel data analysis examples using R. Topics include: two-level nested data, growth curve modeling, generalized linear models for counts and categorical data, nonlinear models, three-level analyses.
Last offered: Spring 2022 | Units: 1

STATS 199: Independent Study

For undergraduates.
Terms: Aut, Win, Spr, Sum | Units: 1-15 | Repeatable for credit

STATS 200: Introduction to Statistical Inference

Modern statistical concepts and procedures derived from a mathematical framework. Statistical inference, decision theory; point and interval estimation, tests of hypotheses; Neyman-Pearson theory. Bayesian analysis; maximum likelihood, large sample theory. Prerequisite: STATS 116. Please note that students must enroll in one section in addition to the main lecture.
Terms: Aut, Win, Sum | Units: 4

STATS 202: Data Mining and Analysis

Data mining is used to discover patterns and relationships in data. Emphasis is on large complex data sets such as those in very large databases or through web mining. Topics: decision trees, association rules, clustering, case based methods, and data visualization. Prereqs: Introductory courses in statistics or probability (e.g., Stats 60), linear algebra (e.g., Math 51), and computer programming (e.g., CS 105). May not be taken for credit by students with credit in STATS 216 or 216V.
Terms: Aut, Sum | Units: 3

STATS 203: Introduction to Regression Models and Analysis of Variance

Modeling and interpretation of observational and experimental data using linear and nonlinear regression methods. Model building and selection methods. Multivariable analysis. Fixed and random effects models. Experimental design. Prerequisites: A post-calculus introductory probability course, e.g. STATS 116, basic computer programming knowledge, some familiarity with matrix algebra, and a pre- or co-requisite post-calculus mathematical statistics course, e.g. STATS 200.
Terms: Win | Units: 3

STATS 203V: Introduction to Regression Models and Analysis of Variance

Modeling and interpretation of observational and experimental data using linear and nonlinear regression methods. Model building and selection methods. Multivariable analysis. Fixed and random effects models. Experimental design. This course is offered remotely only via video segments (MOOC style). TAs will host remote weekly office hours using an online platform such as Zoom. Prerequisites: A post-calculus introductory probability course, e.g. STATS 116, basic computer programming knowledge, some familiarity with matrix algebra, and a pre- or co-requisite post-calculus mathematical statistics course, e.g. STATS 200.
Last offered: Summer 2023 | Units: 3

STATS 204: Sampling

How best to take data and where to sample it. Examples include surveys and sampling from data warehouses. Emphasis is on methods for finite populations. Topics: simple random sampling, stratified sampling, cluster sampling, ratio and regression estimators, two stage sampling.
Last offered: Spring 2023 | Units: 3

STATS 205: Introduction to Nonparametric Statistics

Nonparametric regression and nonparametric density estimation, modern nonparametric techniques, nonparametric confidence interval estimates, nearest neighbor algorithms (with non-linear features), wavelet, bootstrap. Nonparametric analogs of the one- and two-sample t-tests and analysis of variance
Terms: Spr | Units: 3

STATS 206: Applied Multivariate Analysis (BIODS 206)

Introduction to the statistical analysis of several quantitative measurements on each observational unit. Emphasis is on concepts, computer-intensive methods. Examples from economics, education, geology, psychology. Topics: multiple regression, multivariate analysis of variance, principal components, factor analysis, canonical correlations, multidimensional scaling, clustering. Pre- or corequisite: 200.
Terms: Aut | Units: 3
Instructors: ; Owen, A. (PI); Li, H. (TA)

STATS 207: Introduction to Time Series Analysis (STATS 307)

Time series models used in economics and engineering. Trend fitting, autoregressive and moving average models and spectral analysis, Kalman filtering, and state-space models. Seasonality, transformations, and introduction to financial time series. Prerequisite: basic course in Statistics at the level of 200.
Terms: Spr | Units: 3

STATS 208: Bootstrap, Cross-Validation, and Sample Re-use

By re-using the sample data, sometimes in ingenious ways, we can evaluate the accuracy of predictions, test the significance of a conclusion, place confidence bounds on an unknown parameter, select the best prediction architecture, and develop more accurate predictors. In this course, we will describe the many ways that samples get reused to achieve these goals, including the bootstrap, the parametric bootstrap, cross-validation, conformal prediction, random forests, and sample splitting. We also develop basic theory justifying such methods. Prerequisite: course in statistics or probability.
Terms: Win | Units: 3
Instructors: ; Donoho, D. (PI); Wang, Y. (TA)

STATS 209: Introduction to Causal Inference

This course introduces the fundamental ideas and methods in causal inference, with examples drawn from education, economics, medicine, and digital marketing. Topics include potential outcomes, randomization, observational studies, matching, covariate adjustment, AIPW, heterogeneous treatment effects, instrumental variables, regression discontinuity, and synthetic controls. Prerequisites: basic probability and statistics, familiarity with R.
Terms: Aut | Units: 3

STATS 209B: Applications of Causal Inference Methods (EDUC 260A, EPI 239)

See http://rogosateaching.com/stat209/. Application of potential outcomes formulation for causal inference to research settings including: mediation, compliance adjustments, time-1 time-2 designs, encouragement designs, heterogeneous treatment effects, aggregated data, instrumental variables, analysis of covariance regression adjustments, and implementations of matching methods. Prerequisite: an introduction to causal inference methods such as STATS209.
Last offered: Winter 2022 | Units: 2

STATS 211: Meta-research: Appraising Research Findings, Bias, and Meta-analysis (CHPR 206, EPI 206, MED 206)

Open to graduate, medical, and undergraduate students. Appraisal of the quality and credibility of research findings; evaluation of sources of bias. Meta-analysis as a quantitative (statistical) method for combining results of independent studies. Examples from medicine, epidemiology, genomics, ecology, social/behavioral sciences, education. Collaborative analyses. Project involving generation of a meta-research project or reworking and evaluation of an existing published meta-analysis. Prerequisite: knowledge of basic statistics.
Terms: Win | Units: 3

STATS 214: Machine Learning Theory (CS 229M)

How do we use mathematical thinking to design better machine learning methods? This course focuses on developing mathematical tools for answering this question. This course will cover fundamental concepts and principled algorithms in machine learning, particularly those that are related to modern large-scale non-linear models. The topics include concentration inequalities, generalization bounds via uniform convergence, non-convex optimization, implicit regularization effect in deep learning, and unsupervised learning and domain adaptations. Prerequisites: linear algebra ( MATH 51 or CS 205), probability theory (STATS 116, MATH 151 or CS 109), and machine learning ( CS 229, STATS 229, or STATS 315A).
Terms: Aut | Units: 3

STATS 215: Statistical Models in Biology

Poisson and renewal processes, Markov chains in discrete and continuous time, branching processes, diffusion. Applications to models of nucleotide evolution, recombination, the Wright-Fisher process, coalescence, genetic mapping, sequence analysis. Theoretical material approximately the same as in STATS 217, but emphasis is on examples drawn from applications in biology, especially genetics. Prerequisite: 116 or equivalent.
Terms: Win | Units: 3

STATS 216: Introduction to Statistical Learning

Overview of supervised learning, with a focus on regression and classification methods. Syllabus includes: linear and polynomial regression, logistic regression and linear discriminant analysis;cross-validation and the bootstrap, model selection and regularization methods (ridge and lasso); nonlinear models, splines and generalized additive models; tree-based methods, random forests and boosting; support-vector machines; Some unsupervised learning: principal components and clustering (k-means and hierarchical). Computing is done in R, through tutorial sessions and homework assignments. This math-light course is offered via video segments (MOOC style), and in-class problem solving sessions. Prereqs: Introductory courses in statistics or probability (e.g., Stats 60 or Stats 101), linear algebra (e.g., Math 51), and computer programming (e.g., CS 105). May not be taken for credit by students with credit in STATS 202 or STATS 216V.
Terms: Win | Units: 3

STATS 216V: Introduction to Statistical Learning

Overview of supervised learning, with a focus on regression and classification methods. Syllabus includes: linear and polynomial regression, logistic regression and linear discriminant analysis; cross-validation and the bootstrap, model selection and regularization methods (ridge and lasso); nonlinear models, splines and generalized additive models; tree-based methods, random forests and boosting; support-vector machines; Some unsupervised learning: principal components and clustering (k-means and hierarchical). Computing is done in R, through tutorial sessions and homework assignments. This math-light course is offered remotely only via video segments (MOOC style). TAs will host remote weekly office hours using an online platform such as Zoom. There are four homework assignments, a midterm, and a final exam, all of which are administered remotely. Prereqs: Introductory courses in statistics or probability (e.g., Stats 60 or Stats 101), linear algebra (e.g., Math 51), and computer programming (e.g., CS 105). May not be taken for credit by students with credit in STATS 202 or STATS 216.
Terms: Sum | Units: 3
Instructors: ; Bodwin, K. (PI)

STATS 217: Introduction to Stochastic Processes I

Discrete and continuous time Markov chains, poisson processes, random walks, branching processes, first passage times, recurrence and transience, stationary distributions. Non-Statistics masters students may want to consider taking STATS 215 instead. Prerequisite: a post-calculus introductory probability course e.g. STATS 116
Terms: Win, Sum | Units: 3

STATS 218: Introduction to Stochastic Processes II

Renewal theory, Brownian motion, Gaussian processes, second order processes, martingales.
Terms: Spr | Units: 3
Instructors: ; Li, S. (PI); Zhou, Y. (TA)

STATS 219: Stochastic Processes (MATH 136)

Introduction to measure theory, Lp spaces and Hilbert spaces. Random variables, expectation, conditional expectation, conditional distribution. Uniform integrability, almost sure and Lp convergence. Stochastic processes: definition, stationarity, sample path continuity. Examples: random walk, Markov chains, Gaussian processes, Poisson processes, Martingales. Construction and basic properties of Brownian motion. Prerequisite: STATS 116 or MATH 151 or equivalent. Recommended: MATH 115 or equivalent. http://statweb.stanford.edu/~adembo/math-136/
Terms: Win | Units: 4

STATS 220: Machine Learning Methods for Neural Data Analysis (CS 339N, NBIO 220, STATS 320)

With modern high-density electrodes and optical imaging techniques, neuroscientists routinely measure the activity of hundreds, if not thousands, of cells simultaneously. Coupled with high-resolution behavioral measurements, genetic sequencing, and connectomics, these datasets offer unprecedented opportunities to learn how neural circuits function. This course will study statistical machine learning methods for analysing such datasets, including: spike sorting, calcium deconvolution, and voltage smoothing techniques for extracting relevant signals from raw data; markerless tracking methods for estimating animal pose in behavioral videos; network models for connectomics and fMRI data; state space models for analysis of high-dimensional neural and behavioral time-series; point process models of neural spike trains; and deep learning methods for neural encoding and decoding. We will develop the theory behind these models and algorithms and then apply them to real datasets in the homeworks and final project.This course is similar to STATS215: Statistical Models in Biology and STATS366: Modern Statistics for Modern Biology, but it is specifically focused on statistical machine learning methods for neuroscience data. Prerequisites: Students should be comfortable with basic probability (STATS 116) and statistics (at the level of STATS 200). This course will place a heavy emphasis on implementing models and algorithms, so coding proficiency is required.
Last offered: Winter 2023 | Units: 3

STATS 221: Random Processes on Graphs and Lattices

Covering modern topics in the study of random processes on graphs and lattices. Specifically, a subset of: Random walks, electrical networks and flows. Uniform spanning trees. Percolation and self-avoiding walks. Contact process, voter model and the exclusion process. Ising, Potts, and Random-Cluster model. Random graphs. Prerequisites: MATH 115 (or equivalent), STAT 217 (or equivalent).
Last offered: Winter 2022 | Units: 3

STATS 223: Sequential Analysis (STATS 323)

This course will survey the history of sequential analysis from its origin in the 1940s via its continuing role in clinical trials to current activity in machine learning. Subject to the limitations of time, the following topics will be discussed: parametric and semi-parametric hypothesis testing from Wald to sequential clinical trials; fixed precision estimation; change-point detection and estimation; iterative stochastic algorithms and machine learning; anytime-valid inference; optimal stopping, dynamic programming, and stochastic control; multi-armed bandits; applications. Prerequisites: for 223, Stats 200 or equivalent; for 323, Stats 300A and 310A.
Terms: Aut | Units: 3

STATS 229: Machine Learning (CS 229)

Topics: statistical pattern recognition, linear and non-linear regression, non-parametric methods, exponential family, GLMs, support vector machines, kernel methods, deep learning, model/feature selection, learning theory, ML advice, clustering, density estimation, EM, dimensionality reduction, ICA, PCA, reinforcement learning and adaptive control, Markov decision processes, approximate dynamic programming, and policy search. Prerequisites: knowledge of basic computer science principles and skills at a level sufficient to write a reasonably non-trivial computer program in Python/NumPy to the equivalency of CS106A, CS106B, or CS106X, familiarity with probability theory to the equivalency of CS 109, MATH151, or STATS 116, and familiarity with multivariable calculus and linear algebra to the equivalency of MATH51 or CS205.
Terms: Aut, Win, Sum | Units: 3-4

STATS 232: Machine Learning for Sequence Modeling (CS 229B)

Sequence data and time series are becoming increasingly ubiquitous in fields as diverse as bioinformatics, neuroscience, health, environmental monitoring, finance, speech recognition/generation, video processing, and natural language processing. Machine learning has become an indispensable tool for analyzing such data; in fact, sequence models lie at the heart of recent progress in AI like GPT3. This class integrates foundational concepts in time series analysis with modern machine learning methods for sequence modeling. Connections and key differences will be highlighted, as well as how grounding modern neural network approaches with traditional interpretations can enable powerful leaps forward. You will learn theoretical fundamentals, but the focus will be on gaining practical, hands-on experience with modern methods through real-world case studies. You will walk away with a broad and deep perspective of sequence modeling and key ways in which such data are not just 1D images.
Terms: Aut | Units: 3-4
Instructors: ; Fox, E. (PI)

STATS 237: Investment Portfolios, Derivative Securities, and Risk Measures

Asset returns and their volatilities. Markowitz portfolio theory, capital asset pricing model, multifactor pricing models. Measures of market risk and statistical models and methods for their estimation and backtesting. Financial derivatives and hedging. Black-Scholes pricing of European options and implied volatilities. Prerequisite: STATS 116 or equivalent.
Last offered: Summer 2019 | Units: 3

STATS 240: Statistical Methods in Finance

(SCPD students register for 240P.) Regression analysis and applications to investment models. Principal components and multivariate analysis. Likelihood inference and Bayesian methods. Financial time series. Estimation and modeling of volatilities. Statistical methods for portfolio management. Prerequisite: STATS 200 or equivalent.
Last offered: Autumn 2021 | Units: 3

STATS 241: Data-driven Financial Econometrics

(SCPD students register for 241P) Approximate dynamic programming and time series approaches in options, interest rate, and credit markets. Nonlinear least squares, nonparametric regression and model selection. Behavioral finance and efficient markets. Economic capital, risk measures, and regulatory supervision. Quantile regression, extreme value theory, and applications to market risk analytics. Empirical Bayes approach to pricing insurance contracts. Corporate bonds, bond ratings, and corporate default analytics. Prerequisite or corequisite: STATS 240 or equivalent.
Last offered: Spring 2021 | Units: 3

STATS 242: NeuroTech Training Seminar (NSUR 239)

This is a required course for students in the NeuroTech training program, and is also open to other graduate students interested in learning the skills necessary for neurotechnology careers in academia or industry. Over the academic year, topics will include: emerging research in neurotechnology, communication skills, team science, leadership and management, intellectual property, entrepreneurship and more.
Terms: Aut, Win, Spr | Units: 1 | Repeatable 9 times (up to 9 units total)

STATS 243: Risk Analytics and Management in Finance and Insurance (CME 243)

Market risk and credit risk, credit markets. Back testing, stress testing and Monte Carlo methods. Logistic regression, generalized linear models and generalized mixed models. Loan prepayment and default as competing risks. Survival and hazard functions, correlated default intensities, frailty and contagion. Risk surveillance, early warning and adaptive control methodologies. Banking and bank regulation, asset and liability management. Prerequisite: STATS 240 or equivalent.
Last offered: Winter 2022 | Units: 3

STATS 244: Quantitative Trading: Algorithms, Data, and Optimization

Statistical trading rules and performances evaluation. Active portfolio management and dynamic investment strategies. Data analytics and models of transactions data. Limit order book dynamics in electronic exchanges. Algorithmic trading, informatics, and optimal execution. Market making and inventory control. Risk management and regulatory issues. Prerequisites: STATS 240 or equivalent.
Last offered: Autumn 2016 | Units: 2-4

STATS 245: Data, Models and Applications to Healthcare Analytics

Topics on fundamentals of data science, biological and statistical models, application to medical product safety evaluation, health risk models and their evaluation, benefit-risk assessment and multi-criteria decision analytics. Applications to environmental health, nutritional epidemiology, wellness and prevention will also be discussed. Prerequisite: Graduate students - STATS 202 or 216, or CS 229; Undergraduate students - consent of instructor.
Last offered: Summer 2019 | Units: 3

STATS 249: Experimental Immersion in Neuroscience (NSUR 249)

This course provides students from technical backgrounds (e.g., physics, applied physics, electrical or chemical engineering, bioengineering, computer science, statistics) the opportunity to learn how they can apply their expertise to advancing experimental research in the neurosciences. Students will visit one neuroscience lab per week to watch experiments, understand the technical apparatus and animal models being used, discuss the questions being addressed, and interact with students and others conducting the research. This course is strongly encouraged for students who wish to apply to the NeuroTech graduate training program. Our course has limited enrollment, therefore, if you are interested in registering please complete the form here: https://forms.gle/QXmkVfCqeS4zHmwB7 prior and someone will follow-up with you with a permission code accordingly.
Terms: Aut | Units: 1

STATS 250: Mathematical Finance (MATH 238)

Stochastic models of financial markets. Risk neutral pricing for derivatives, hedging strategies and management of risk. Multidimensional portfolio theory and introduction to statistical arbitrage. Prerequisite: Math 136 or equivalent. NOTE: Undergraduates require instructor permission to enroll. Undergraduates interested in taking the course should contact the instructor for permission, providing information about relevant background such as other courses taken.
Terms: Win | Units: 3
Instructors: ; Papanicolaou, G. (PI)

STATS 251: Clinical Trial Design in the Age of Precision Medicine (BIODS 250)

This course offers an overview of statistical foundation for modern clinical trial design in precision medicine research. Starting from a quick review of traditional clinical development paradigm through Phase I to III clinical trials for medical product approval and Phase IV post-marketing studies for safety evaluation, and challenges in the time and society costs, we will introduce recently developed innovative designs and their statistical methodology across all phases of clinical trials. You expected to learn the statistical considerations for novel phase I-II trial designs, master protocols for umbrella, platform and basket trials, adaptive and enrichment designs including subgroup selections, estimand, surrogate and composite endpoints, integration of real-world evidence and patient-focused medical product development, and meta-analysis of clinical trial endpoints. Prerequisites: Working knowledge of statistics and R.
Terms: Win | Units: 3

STATS 256: Modern Statistics for Modern Biology (BIOS 221, STATS 366)

Application based course in nonparametric statistics. Modern toolbox of visualization and statistical methods for the analysis of data, examples drawn from immunology, microbiology, cancer research and ecology. Methods covered include multivariate methods (PCA and extensions), sparse representations (trees, networks, contingency tables) as well as nonparametric testing (Bootstrap, permutation and Monte Carlo methods). Hands on, use R and cover many Bioconductor packages. Prerequisite: Working knowledge of R and two core Biology courses. Note that the 155 offering is a writing intensive course for undergraduates only and requires instructor consent. (WIM). See https://web.stanford.edu/class/bios221/index.html
Terms: Aut | Units: 3

STATS 260A: Workshop in Biostatistics (BIODS 260A)

Applications of data science techniques to current problems in biology, medicine and healthcare. To receive credit for one or two units, a student must attend every workshop. To receive two units, in addition to attending every workshop, the student is required to write a two page critical summary of one of the workshops, with the choice made by the student.
Terms: Aut | Units: 1-2 | Repeatable for credit

STATS 260B: Workshop in Biostatistics (BIODS 260B)

Applications of data science techniques to current problems in biology, medicine and healthcare. To receive credit for one or two units, a student must attend every workshop. To receive two units, in addition to attending every workshop, the student is required to write a two page critical summary of one of the workshops, with the choice made by the student
Terms: Win | Units: 1-2 | Repeatable for credit

STATS 260C: Workshop in Biostatistics (BIODS 260C)

Applications of data science techniques to current problems in biology, medicine and healthcare. To receive credit for one or two units, a student must attend every workshop. To receive two units, in addition to attending every workshop, the student is required to write a two page critical summary of one of the workshops, with the choice made by the student
Terms: Spr | Units: 1-2 | Repeatable for credit

STATS 261: Intermediate Biostatistics: Analysis of Discrete Data (BIOMEDIN 233, EPI 261)

Methods for analyzing data from case-control and cross-sectional studies: the 2x2 table, chi-square test, Fisher's exact test, odds ratios, Mantel-Haenzel methods, stratification, tests for matched data, logistic regression, conditional logistic regression. Emphasis is on data analysis in SAS or R. Special topics: cross-fold validation and bootstrap inference.
Terms: Win | Units: 3

STATS 262: Intermediate Biostatistics: Regression, Prediction, Survival Analysis (EPI 262)

Methods for analyzing longitudinal data. Topics include Kaplan-Meier methods, Cox regression, hazard ratios, time-dependent variables, longitudinal data structures, profile plots, missing data, modeling change, MANOVA, repeated-measures ANOVA, GEE, and mixed models. Emphasis is on practical applications. Prerequisites: basic ANOVA and linear regression.
Terms: Spr | Units: 3

STATS 263: Design of Experiments (STATS 363)

Experiments vs observation. Confounding. Randomization. ANOVA.Blocking. Latin squares. Factorials and fractional factorials. Split plot. Response surfaces. Mixture designs. Optimal design. Central composite. Box-Behnken. Taguchi methods. Computer experiments and space filling designs. Prerequisites: probability at STATS 116 level or higher, and at least one course in linear models.
Last offered: Autumn 2022 | Units: 3

STATS 264: Foundations of Statistical and Scientific Inference (EPI 264)

The course will consist of readings and discussion of foundational papers and book sections in the domains of statistical and scientific inference. Topics to be covered include philosophy of science, interpretations of probability, Bayesian and frequentist approaches to statistical inference and current controversies about the proper use of p-values and research reproducibility. Recommended preparation: At least 2 quarters of biostatistics and one of epidemiology. Intended for second year Masters students or PhD students with at least 1 year of preceding graduate training.
Terms: Aut | Units: 1
Instructors: ; Goodman, S. (PI)

STATS 270: Bayesian Statistics (STATS 370)

This course will treat Bayesian statistics at a relatively advanced level. Assuming familiarity with standard probability and multivariate distribution theory, we will provide a discussion of the mathematical and theoretical foundation for Bayesian inferential procedures. In particular, we will examine the construction of priors and the asymptotic properties of likelihoods and posterior distributions. The discussion will include but will not be limited to the case of finite dimensional parameter space. There will also be some discussions on the computational algorithms useful for Bayesian inference. Prerequisites: Stats 116 or equivalent probability course, plus basic programming knowledge; basic calculus, analysis and linear algebra strongly recommended; Stats 200 or equivalent statistical theory course desirable.
Terms: Spr | Units: 3
Instructors: ; Wong, W. (PI); Lu, S. (TA)

STATS 271: Applied Bayesian Statistics (STATS 371)

This course is a modern treatment of applied Bayesian statistics with a focus on high-dimensional problems. We will study a collection of canonical methods that see heavy use in applications, including high-dimensional linear and generalized linear models, hierarchical/random effects models, Gaussian processes, variable-dimension and Dirichlet process mixtures, graphical models, and methods used in Bayesian inverse problems. Each method will be accompanied by one or more motivating datasets. Through these examples the course will cover: (1) Bayesian hypothesis testing, multiplicity correction, selection, shrinkage, and model averaging; (2) prior choice; (3) Frequentist properties of Bayesian procedures in high dimensions; and (4) computation by Markov chain Monte Carlo, including constructing efficient Gibbs, Metropolis, and more exotic samplers, empirical convergence analysis, strategies for scaling computation to high dimensions (approximations, divide-and-conquer, minibatching, et cetera), and the theory of convergence rates.
Last offered: Spring 2021 | Units: 3

STATS 281: Statistical Analysis of Fine Art

This course presents the application of rigorous statistical analysis, machine learning, and data analysis to problems in the history and interpretation of fine art paintings, drawings, and other two-dimensional artworks. The course focuses on the aspects of these problems that are unlike those addressed widely elsewhere in statistical image analysis, such as applied to photographs, videos, and medical images. These novel problems include statistical analysis of brushstrokes and marks, medium, inferring artists' working methods, compositional principles, stylometry (quantification of style), the tracing of artistic influence, and art attribution and authentication. The course revisits classic problems, such as image-based object recognition and scene description, but in the environment of highly non-realistic, stylized artworks. Prerequisites: a course in machine learning, pattern recognition, or introductory data science; expertise in a high-level programming language of your choice (Matlab, Mathematica, R, Python, C/C++, ...); implementation knowledge of deep neural networks in a framework of your choice (PyTorch, TensorFlow, Keras, ...). Recommended: a course in Art and Art History; a course in image processing.
Last offered: Autumn 2021 | Units: 3

STATS 285: Massive Computational Experiments, Painlessly

Ambitious Data Science requires massive computational experimentation; the entry ticket for a solid PhD in some fields is now to conduct experiments involving 1 Million CPU hours. Recently several groups have created efficient computational environments that make it painless to run such massive experiments. This course reviews state-of-the-art practices for doing massive computational experiments on compute clusters in a painless and reproducible manner. Students will learn how to automate their computing experiments first of all using nuts-and-bolts tools such as Perl and Bash, and later using available comprehensive frameworks such as ClusterJob and CodaLab, which enables them to take on ambitious Data Science projects. The course also features few guest lectures by renowned scientists in the field of Data Science. Students should have a familiarity with computational experiments and be facile in some high-level computer language such as R, Matlab, or Python.
Terms: Aut | Units: 2
Instructors: ; Donoho, D. (PI)

STATS 290: Computing for Data Science

Programming and computing techniques for the requirements of data science: acquisition and organization of data; visualization, modelling and inference for scientific applications; presentation and interactive communication of results. Emphasis on computing for substantial projects. Software development with emphasis on R, plus other key software tools. Prerequisites: Programming experience including familiarity with R; computing at least at the level of CS 106; statistics at the level of STATS 110 or 141.
Last offered: Winter 2020 | Units: 3

STATS 298: Industrial Research for Statisticians

Masters-level research as in 299, but with the approval and supervision of a faculty adviser, it must be conducted for an off-campus employer. Students must submit a written final report upon completion of the internship in order to receive credit. Repeatable for credit. Prerequisite: enrollment in Statistics M.S. program. IMPORTANT: F-1 international students enrolled in this CPT course cannot start working without first obtaining a CPT-endorsed I-20 from Bechtel International Center (enrolling in the CPT course alone is insufficient to meet federal immigration regulations).
Terms: Aut, Win, Spr, Sum | Units: 1 | Repeatable 3 times (up to 3 units total)

STATS 299: Independent Study

For Statistics M.S. students only. Reading or research program under the supervision of a Statistics faculty member. May be repeated for credit.
Terms: Aut, Win, Spr, Sum | Units: 1-5 | Repeatable for credit

STATS 300A: Theory of Statistics I

Finite sample optimality of statistical procedures; Decision theory: loss, risk, admissibility; Principles of data reduction: sufficiency, ancillarity, completeness; Statistical models: exponential families, group families, nonparametric families; Point estimation: optimal unbiased and equivariant estimation, Bayes estimation, minimax estimation; Hypothesis testing and confidence intervals: uniformly most powerful tests, uniformly most accurate confidence intervals, optimal unbiased and invariant tests. Prerequisites: Real analysis, introductory probability (at the level of STATS 116), and introductory statistics.
Terms: Aut | Units: 3

STATS 300B: Theory of Statistics II

Elementary decision theory; loss and risk functions, Bayes estimation; UMVU estimator, minimax estimators, shrinkage estimators. Hypothesis testing and confidence intervals: Neyman-Pearson theory; UMP tests and uniformly most accurate confidence intervals; use of unbiasedness and invariance to eliminate nuisance parameters. Large sample theory: basic convergence concepts; robustness; efficiency; contiguity, locally asymptotically normal experiments; convolution theorem; asymptotically UMP and maximin tests. Asymptotic theory of likelihood ratio and score tests. Rank permutation and randomization tests; jackknife, bootstrap, subsampling and other resampling methods. Further topics: sequential analysis, optimal experimental design, empirical processes with applications to statistics, Edgeworth expansions, density estimation, time series.
Terms: Win | Units: 3

STATS 300C: Theory of Statistics III

Decision theory formulation of statistical problems. Minimax, admissible procedures. Complete class theorems ("all" minimax or admissible procedures are "Bayes"), Bayes procedures, conjugate priors, hierarchical models. Bayesian non parametrics: diaichlet, tail free, polya trees, bayesian sieves. Inconsistency of bayes rules.
Terms: Spr | Units: 3

STATS 301: Statistics Teaching Practicum

Ordinarily for Statistics first year PhD students. Discussion of effective teaching, assessment, and course design. Students practice teaching in a guided environment. There will be a total of 10 course meetings spread out across autumn, winter, and spring quarters, but students enroll in spring quarter.
Terms: Spr | Units: 1 | Repeatable 3 times (up to 3 units total)
Instructors: ; Sun, D. (PI)

STATS 302: Qualifying Exams Workshop

Prepares Statistics Ph.D. students for the qualifying exams by reviewing relevant course topics and problem solving strategies.
Terms: Sum | Units: 5-10

STATS 303: Statistics Faculty Research Presentations

For Statistics first and second year PhD students only. Discussion of statistics topics and research areas; consultation with PhD advisors.
Terms: Aut | Units: 1 | Repeatable 2 times (up to 2 units total)
Instructors: ; Taylor, J. (PI)

STATS 305A: Applied Statistics I

Statistics of real valued responses. Review of multivariate normal distribution theory. Univariate regression. Multiple regression. Constructing features from predictors. Geometry and algebra of least squares: subspaces, projections, normal equations, orthogonality, rank deficiency, Gauss-Markov. Gram-Schmidt, the QR decomposition and the SVD. Interpreting coefficients. Collinearity. Dependence and heteroscedasticity. Fits and the hat matrix. Model diagnostics. Model selection, Cp/AIC and crossvalidation, stepwise, lasso. Multiple comparisons. ANOVA, fixed and random effects. Use of bootstrap and permutations. Emphasis on problem sets involving substantive computations with data sets. Prerequisites: consent of instructor, 116, 200, applied statistics course, CS 106A, MATH 114.
Terms: Aut | Units: 3

STATS 305B: Applied Statistics II

This course uses exponential family structure to motivate generalized linear models and other useful applied techniques including survival analysis methods and Bayes and empirical Bayes analyses. The lectures are based on a forthcoming book whose notes will be distributed. Prerequisites: 305A or consent of the instructor.
Terms: Win | Units: 3

STATS 305C: Applied Statistics III

Methods for multivariate responses. Theory, computation, and practice for multivariate statistical tools. Topics may include multivariate Gaussian models, probabilistic graphical models, MCMC and variational Bayesian inference, dimensionality reduction, principal components, factor analysis, independent components analysis, canonical correlations, linear discriminant analysis, hierarchical clustering, bi-clustering, multidimensional scaling and variants (e.g., Isomap, spectral clustering, t-SNE), matrix completion, topic modeling, and state space models. Extensive work with data involving programming, ideally in Python and/or R. Prerequisites: Stats 305A and Stats 305B or consent of the instructor.
Terms: Spr | Units: 3

STATS 307: Introduction to Time Series Analysis (STATS 207)

Time series models used in economics and engineering. Trend fitting, autoregressive and moving average models and spectral analysis, Kalman filtering, and state-space models. Seasonality, transformations, and introduction to financial time series. Prerequisite: basic course in Statistics at the level of 200.
Terms: Spr | Units: 3

STATS 310A: Theory of Probability I (MATH 230A)

Mathematical tools: sigma algebras, measure theory, connections between coin tossing and Lebesgue measure, basic convergence theorems. Probability: independence, Borel-Cantelli lemmas, almost sure and Lp convergence, weak and strong laws of large numbers. Large deviations. Weak convergence; central limit theorems; Poisson convergence; Stein's method. Prerequisites: STATS 116, MATH 171.
Terms: Aut | Units: 3

STATS 310B: Theory of Probability II (MATH 230B)

Conditional expectations, discrete time martingales, stopping times, uniform integrability, applications to 0-1 laws, Radon-Nikodym Theorem, ruin problems, etc. Other topics as time allows selected from (i) local limit theorems, (ii) renewal theory, (iii) discrete time Markov chains, (iv) random walk theory,n(v) ergodic theory. http://statweb.stanford.edu/~adembo/stat-310b. Prerequisite: 310A or MATH 230A.
Terms: Win | Units: 3

STATS 310C: Theory of Probability III (MATH 230C)

Continuous time stochastic processes: martingales, Brownian motion, stationary independent increments, Markov jump processes and Gaussian processes. Invariance principle, random walks, LIL and functional CLT. Markov and strong Markov property. Infinitely divisible laws. Some ergodic theory. Prerequisite: 310B or MATH 230B. http://statweb.stanford.edu/~adembo/stat-310c/
Terms: Spr | Units: 3
Instructors: ; Dembo, A. (PI); Tung, N. (TA)

STATS 311: Information Theory and Statistics (EE 377)

Information theoretic techniques in probability and statistics. Fano, Assouad,nand Le Cam methods for optimality guarantees in estimation. Large deviationsnand concentration inequalities (Sanov's theorem, hypothesis testing, thenentropy method, concentration of measure). Approximation of (Bayes) optimalnprocedures, surrogate risks, f-divergences. Penalized estimators and minimumndescription length. Online game playing, gambling, no-regret learning. Prerequisites: EE 276 (or equivalent) or STATS 300A.
Terms: Aut | Units: 3

STATS 314A: Advanced Statistical Theory

This course will introduce the sum-of-squares algorithmic paradigm, focusing on its applications in statistics. It will touch on a wide range of topics including clustering, robust mean estimation, robust regression, mean-field approximations of Ising models, tensor decompositions for learning latent variable models, and information-computation gaps.
Last offered: Spring 2022 | Units: 3 | Repeatable for credit

STATS 315A: Modern Applied Statistics: Learning

Overview of supervised learning. Linear regression and related methods. Model selection, least angle regression and the lasso, stepwise methods. Classification. Linear discriminant analysis, logistic regression, and support vector machines (SVMs). Basis expansions, splines and regularization. Kernel methods. Generalized additive models. Kernel smoothing. Gaussian mixtures and the EM algorithm. Model assessment and selection: crossvalidation and the bootstrap. Pathwise coordinate descent. Sparse graphical models. Prerequisites: STATS 305A, 305B, 305C or consent of instructor.
Terms: Win | Units: 3

STATS 315B: Modern Applied Statistics: Learning II

Modern statistical machine learning topics moving beyond linear regression and classification. Decision trees (boosting, random forests) and deep learning techniques for non-linear regression and classification tasks. Discovering patterns and low-dimensional structure via unsupervised learning, including clustering, EM algorithm, PCA and factor analysis, (variational) autoencoding methods, and matrix factorization. Time series and sequence modeling via state space models and deep learning methods (recurrent neural networks, seq2seq models, transformers). Students entering the course are assumed to have foundational working knowledge in statistics, probability, and basic machine learning concepts, though the course has been designed to provide a broadly accessible treatment of the topics covered.
Last offered: Spring 2023 | Units: 3

STATS 316: Stochastic Processes on Graphs

Local weak convergence, Gibbs measures on trees, cavity method, and replica symmetry breaking. Examples include random k-satisfiability, the assignment problem, spin glasses, and neural networks. Prerequisite: 310A or equivalent. https://web.stanford.edu/~montanar/TEACHING/Stat316/stat316.html
Last offered: Autumn 2017 | Units: 1-3

STATS 317: Stochastic Processes

Semimartingales, stochastic integration, Ito's formula, Girsanov's theorem. Gaussian and related processes. Stationary/isotropic processes. Integral geometry and geometric probability. Maxima of random fields and applications to spatial statistics and imaging.
Terms: Win | Units: 3
Instructors: ; Li, S. (PI)

STATS 318: Modern Markov Chains (MATH 235)

Tools for understanding Markov chains as they arise in applications. Random walk on graphs, reversible Markov chains, Metropolis algorithm, Gibbs sampler, hybrid Monte Carlo, auxiliary variables, hit and run, Swedson-Wong algorithms, geometric theory, Poincare-Nash-Cheeger-Log-Sobolov inequalities. Comparison techniques, coupling, stationary times, Harris recurrence, central limit theorems, and large deviations.
Terms: Win | Units: 3

STATS 319: Literature of Statistics

Literature study of topics in statistics and probability culminating in oral and written reports. May be repeated for credit.
Terms: Aut, Win, Spr | Units: 1 | Repeatable for credit

STATS 320: Machine Learning Methods for Neural Data Analysis (CS 339N, NBIO 220, STATS 220)

With modern high-density electrodes and optical imaging techniques, neuroscientists routinely measure the activity of hundreds, if not thousands, of cells simultaneously. Coupled with high-resolution behavioral measurements, genetic sequencing, and connectomics, these datasets offer unprecedented opportunities to learn how neural circuits function. This course will study statistical machine learning methods for analysing such datasets, including: spike sorting, calcium deconvolution, and voltage smoothing techniques for extracting relevant signals from raw data; markerless tracking methods for estimating animal pose in behavioral videos; network models for connectomics and fMRI data; state space models for analysis of high-dimensional neural and behavioral time-series; point process models of neural spike trains; and deep learning methods for neural encoding and decoding. We will develop the theory behind these models and algorithms and then apply them to real datasets in the homeworks and final project.This course is similar to STATS215: Statistical Models in Biology and STATS366: Modern Statistics for Modern Biology, but it is specifically focused on statistical machine learning methods for neuroscience data. Prerequisites: Students should be comfortable with basic probability (STATS 116) and statistics (at the level of STATS 200). This course will place a heavy emphasis on implementing models and algorithms, so coding proficiency is required.
Last offered: Winter 2023 | Units: 3

STATS 322: Function Estimation in White Noise

Gaussian white noise model sequence space form. Hyperrectangles, quadratic convexity, and Pinsker's theorem. Minimax estimation on Lp balls and Besov spaces. Role of wavelets and unconditional bases. Linear and threshold estimators. Oracle inequalities. Optimal recovery and universal thresholding. Stein's unbiased risk estimator and threshold choice. Complexity penalized model selection. Connecting fast wavelet algorithms and theory. Beyond orthogonal bases.
Last offered: Spring 2023 | Units: 3

STATS 323: Sequential Analysis (STATS 223)

This course will survey the history of sequential analysis from its origin in the 1940s via its continuing role in clinical trials to current activity in machine learning. Subject to the limitations of time, the following topics will be discussed: parametric and semi-parametric hypothesis testing from Wald to sequential clinical trials; fixed precision estimation; change-point detection and estimation; iterative stochastic algorithms and machine learning; anytime-valid inference; optimal stopping, dynamic programming, and stochastic control; multi-armed bandits; applications. Prerequisites: for 223, Stats 200 or equivalent; for 323, Stats 300A and 310A.
Terms: Aut | Units: 3

STATS 324: Stein's Method

This course will teach the basics of Stein's method. The specific topics that will be covered are normal approximation, Poisson approximation, and concentration inequalities using Stein's method. If time permits, more advanced topics will be covered.
Last offered: Spring 2022 | Units: 3

STATS 325: Multivariate Analysis and Random Matrices in Statistics

Topics on Multivariate Analysis and Random Matrices in Statistics. Random matrices arise frequently in modern statistical theory, and tools reflecting their properties are the basis of many statistical tests and estimation procedures. Random Matrix theory is both an appealing branch of pure mathematics and an important engine for understanding many phenomena that appear in dealing with modern high-dimensional data. We will emphasize (a) phenomena - the strange things that can happen in high dimensions; (b) sightings - places where these phenomena appear and help explain puzzles in modern machine learning and statistics; (c) monuments - the central objects in the mathematical theory, their names and properties; (d) applications - ways that RMT helps statisticians and applied mathematicians in modern research.
Last offered: Winter 2022 | Units: 3

STATS 334: Mathematics and Statistics of Gambling (MATH 231)

Probability and statistics are founded on the study of games of chance. Nowadays, gambling (in casinos, sports and the Internet) is a huge business. This course addresses practical and theoretical aspects. Topics covered: mathematics of basic random phenomena (physics of coin tossing and roulette, analysis of various methods of shuffling cards), odds in popular games, card counting, optimal tournament play, practical problems of random number generation. Prerequisites: Statistics 116 and 200.
Last offered: Autumn 2020 | Units: 3

STATS 335: The Challenge Problems Paradigm in Empirical Machine Learning and Beyond

In many fields of science and technology, empirical research has been making rapid progress by implicitly following a little-studied research paradigm (CPP) with several distinctive features: a shared public database, a common task, (for example, prediction of class labels or a response variable from given input features), an objective scoring rule that quantifies performance on that task, a leaderboard that tracks performance of submissions, and a set of enrolled competitors who each try to improve the current best-known performance on that task. In the context of Empirical Machine Learning, this is explicitly the famous "Kaggle" model; however, Kaggle didn't originate this approach, and many research disciplines follow the same ingredients, in many cases implicitly or tacitly. As we know, the CPP anchored recent claims of progress in image understanding and in natural language processing. In this course we will review the many instances and variations on the CPP that exist in modern research, including not only in the standard areas of empirical machine learning (computer vision and natural language understanding) but also in academic empirical finance and computational hard sciences. We will discuss evidence that the CPP itself is a kind of secret sauce, rather than the specific technologies that are spotlighted because of CPP. We will discuss software platforms implementing CPP, including Kaggle, but also academic platforms like CodaLab, which is often used for challenge problems in natural language processing, and Nightingale Open Science which is used for challenge problems involving potentially protected health information. Prerequisite: an introductory statistics or machine learning course.
Terms: Aut | Units: 3
Instructors: ; Donoho, D. (PI)

STATS 345: Statistical and Machine Learning Methods for Genomics (BIO 268, BIOMEDIN 245, CS 373)

Introduction to statistical and computational methods for genomics. Sample topics include: expectation maximization, hidden Markov model, Markov chain Monte Carlo, ensemble learning, probabilistic graphical models, kernel methods and other modern machine learning paradigms. Rationales and techniques illustrated with existing implementations used in population genetics, disease association, and functional regulatory genomics studies. Instruction includes lectures and discussion of readings from primary literature. Homework and projects require implementing some of the algorithms and using existing toolkits for analysis of genomic datasets.
Last offered: Winter 2020 | Units: 3

STATS 350: Topics in Probability Theory

See http://statweb.stanford.edu/~adembo/stat-350/concentration/ Selected topics of contemporary research interest in probability theory. May be repeated once for credit. Prerequisite: 310A or equivalent.
Last offered: Autumn 2017 | Units: 3 | Repeatable 2 times (up to 6 units total)

STATS 352: Topics in Computing for Data Science (BIODS 352)

A seminar-style course with lectures on a range of computational topics important for modern data-intensive science, jointly supported by the Statistics department and Stanford Data Science, and suitable for advanced undergraduate/graduate students engaged in either research on data science techniques (statistical or computational, for example) or research in scientific fields relying on advanced data science to achieve its goals. Seminars will alternate a presentation of a topic, usually by an expert on that topic, typically leading to exercises applying the techniques, with a follow up lecture to further discuss the topic and the exercises. Prerequisites: Understanding of basic modern data science and competence in related programming, e.g., in R or Python. https://stats352.stanford.edu/
Terms: Spr | Units: 1

STATS 359: Topics in Mathematical Physics (MATH 273)

Covers a list of topics in mathematical physics. The specific topics may vary from year to year, depending on the instructor's discretion. Background in graduate level probability theory and analysis is desirable.
Last offered: Autumn 2018 | Units: 3 | Repeatable for credit

STATS 361: Causal Inference

This course covers statistical underpinnings of causal inference, with a focus on experimental design and data-driven decision making. Topics include randomization, potential outcomes, observational studies, propensity score methods, matching, double robustness, semiparametric efficiency, treatment heterogeneity, structural models, instrumental variables, principal stratification, mediation, regression discontinuities, synthetic controls, interference, sensitivity analysis, policy learning, dynamic treatment rules, invariant prediction, graphical models, and structure learning. We will also discuss the relevance of optimization and machine learning tools to causal inference. Prerequisite: STATS 300A and STATS 300B, or equivalent graduate-level coursework on the theory of statistics.
Terms: Spr | Units: 3
Instructors: ; Wager, S. (PI); Jing, A. (TA)

STATS 362: Topic: Monte Carlo

Random numbers and vectors: inversion, acceptance-rejection, copulas. Variance reduction: antithetics, stratification, control variates, importance sampling. MCMC: Markov chains, detailed balance, Metropolis-Hastings, random walk Metropolis,nnindependence sampler, Gibbs sampling, slice sampler, hybrids of Gibbs and Metropolis, tempering. Sequential Monte Carlo. Quasi-Monte Carlo. Randomized quasi-Monte Carlo. Examples, problems and motivation from Bayesian statistics,nnmachine learning, computational finance and graphics. May be repeat for credit.
Terms: Win | Units: 3
Instructors: ; Owen, A. (PI); Pan, Z. (TA)

STATS 363: Design of Experiments (STATS 263)

Experiments vs observation. Confounding. Randomization. ANOVA.Blocking. Latin squares. Factorials and fractional factorials. Split plot. Response surfaces. Mixture designs. Optimal design. Central composite. Box-Behnken. Taguchi methods. Computer experiments and space filling designs. Prerequisites: probability at STATS 116 level or higher, and at least one course in linear models.
Last offered: Autumn 2022 | Units: 3

STATS 364: Theory and Applications of Selective Inference

This course focuses on the problem of inference under the presence of multiplicity or selection. Topics covered include classical topics multiple comparisons (FWER, FDR, FCR) as well as newer methods such as knockoffs. We will also cover inference when targeted parameters are determined only after inspection of the data, considering both conditional and simultaneous approaches. Both theoretical and computational considerations will be stressed throughout the course. Prerequisite: STATS 200 or equivalent
Last offered: Spring 2020 | Units: 3

STATS 365: Empirical Likelihood

Empirical likelihood (EL) allows likelihood based inferences without assuming any parametric form for the likelihood. It is based instead on reweighting the sample values. It provides data driven shapes for confidence regions and confidence bands. EL tests have competitive power. EL has recently been used in causal inference, reinforcement learning and distributionally robust inference. This course covers: nonparametric maximum likelihood and likelihood ratios, censoring and truncation, biased sampling, estimating equations, GMM, Bayesian bootstrap, Euclidean and Kullback-Leibler log likelihoods and recent research directions.
Last offered: Spring 2023 | Units: 3

STATS 366: Modern Statistics for Modern Biology (BIOS 221, STATS 256)

Application based course in nonparametric statistics. Modern toolbox of visualization and statistical methods for the analysis of data, examples drawn from immunology, microbiology, cancer research and ecology. Methods covered include multivariate methods (PCA and extensions), sparse representations (trees, networks, contingency tables) as well as nonparametric testing (Bootstrap, permutation and Monte Carlo methods). Hands on, use R and cover many Bioconductor packages. Prerequisite: Working knowledge of R and two core Biology courses. Note that the 155 offering is a writing intensive course for undergraduates only and requires instructor consent. (WIM). See https://web.stanford.edu/class/bios221/index.html
Terms: Aut | Units: 3

STATS 367: Statistical Models in Genetics

This course will cover statistical problems in population genetics and molecular evolution with an emphasis on coalescent theory. Special attention will be paid to current research topics, illustrating the challenges presented by genomic data obtained via high-throughput technologies. No prior knowledge of genomics is necessary. Familiarity with the R statistical package or other computing language is needed for homework assignments. Prerequisites: knowledge of probability through elementary stochastic processes and statistics through likelihood theory.
Last offered: Winter 2018 | Units: 3

STATS 368: Empirical Process Theory and its Applications

This course is on the theory of empirical processes. In the course we will focus on weak convergence of stochastic processes, M-estimation and empirical risk minimization. The course will cover topics like covering numbers and bracketing numbers, maximal inequalities, chaining and symmetrization, uniform law of large numbers and uniform central limit theorems, rates of convergence of MLEs and (penalized) least squares estimators, and concentration inequalities.
Last offered: Spring 2017 | Units: 3

STATS 369: Methods from Statistical Physics

Mathematical techniques from statistical physics have been applied with increasing success on problems form combinatorics, computer science, machine learning. These methods are non-rigorous, but in several cases they were proved to yield correct predictions. This course provides a working knowledge of these methods for non-physicists. Specific topics: the Sherrington-Kirkpatrick model; sparse regression with random designs;
Last offered: Autumn 2021 | Units: 3

STATS 370: Bayesian Statistics (STATS 270)

This course will treat Bayesian statistics at a relatively advanced level. Assuming familiarity with standard probability and multivariate distribution theory, we will provide a discussion of the mathematical and theoretical foundation for Bayesian inferential procedures. In particular, we will examine the construction of priors and the asymptotic properties of likelihoods and posterior distributions. The discussion will include but will not be limited to the case of finite dimensional parameter space. There will also be some discussions on the computational algorithms useful for Bayesian inference. Prerequisites: Stats 116 or equivalent probability course, plus basic programming knowledge; basic calculus, analysis and linear algebra strongly recommended; Stats 200 or equivalent statistical theory course desirable.
Terms: Spr | Units: 3
Instructors: ; Wong, W. (PI); Lu, S. (TA)

STATS 371: Applied Bayesian Statistics (STATS 271)

This course is a modern treatment of applied Bayesian statistics with a focus on high-dimensional problems. We will study a collection of canonical methods that see heavy use in applications, including high-dimensional linear and generalized linear models, hierarchical/random effects models, Gaussian processes, variable-dimension and Dirichlet process mixtures, graphical models, and methods used in Bayesian inverse problems. Each method will be accompanied by one or more motivating datasets. Through these examples the course will cover: (1) Bayesian hypothesis testing, multiplicity correction, selection, shrinkage, and model averaging; (2) prior choice; (3) Frequentist properties of Bayesian procedures in high dimensions; and (4) computation by Markov chain Monte Carlo, including constructing efficient Gibbs, Metropolis, and more exotic samplers, empirical convergence analysis, strategies for scaling computation to high dimensions (approximations, divide-and-conquer, minibatching, et cetera), and the theory of convergence rates.
Last offered: Spring 2021 | Units: 3

STATS 374: Large Deviations Theory (MATH 234)

Combinatorial estimates and the method of types. Large deviation probabilities for partial sums and for empirical distributions, Cramer's and Sanov's theorems and their Markov extensions. Applications in statistics, information theory, and statistical mechanics. Prerequisite: MATH 230A or STATS 310. Offered every 2-3 years. http://statweb.stanford.edu/~adembo/large-deviations/
Last offered: Spring 2019 | Units: 3

STATS 375: Mathematical Problems in Machine Learning (MATH 276)

Mathematical tools to understand modern machine learning systems. Generalization in machine learning, the classical view: uniform convergence, Radamacher complexity. Generalization from stability. Implicit (algorithmic) regularization. Infinite-dimensional models: reproducing kernel Hilbert spaces. Random features approximations to kernel methods. Connections to neural networks, and neural tangent kernel. Nonparametric regression. Asymptotic behavior of wide neural networks. Properties of convolutionalnetworks. Prerequisites: EE364A or equivalent; Stat310A or equivalent.
Terms: Spr | Units: 3
Instructors: ; Montanari, A. (PI)

STATS 376B: Topics in Information Theory and Its Applications (EE 376B)

Information theory establishes the fundamental limits on compression and communication over networks. The tools of information theory have also found applications in many other fields, including probability and statistics, computer science and physics. The course will cover selected topics from these applications, including communication networks, through regular lectures and student projects. Prerequisites: EE276 (Formerly EE376A)
Last offered: Spring 2019 | Units: 3

STATS 385: Analyses of Deep Learning

Deep learning is a transformative technology that has delivered impressive improvements in image classification and speech recognition. Many researchers are trying to better understand how to improve prediction performance and also how to improve training methods. Some researchers use experimental techniques; others use theoretical approaches. In this course we will review both experimental and theoretical analyses of deep learning. We will have 8-10 guest lecturers as well as graded projects for those who take the course for credit.
Last offered: Autumn 2019 | Units: 1

STATS 390: Consulting Workshop

Skills required of practicing statistical consultants, including exposure to statistical applications. Students participate as consultants in the department's drop-in consulting service, analyze client data, and prepare formal written reports. Seminar provides supervised experience in short term consulting. May be repeated for credit. Prerequisites: graduate course work in applied statistics or data analysis, and consent of instructor.
Terms: Aut, Win, Spr, Sum | Units: 1 | Repeatable for credit

STATS 397: PhD Oral Exam Workshop

For Statistics PhD students defending their dissertation.
Last offered: Spring 2020 | Units: 1

STATS 398: Industrial Research for Statisticians

Doctoral research as in 399, but must be conducted for an off-campus employer. A final report acceptable to the advisor outlining work activity, problems investigated, key results, and any follow-up projects they expect to perform is required. The report is due at the end of the quarter in which the course is taken. May be repeated for credit. Prerequisite: Statistics Ph.D. candidate. IMPORTANT: F-1 international students enrolled in this CPT course cannot start working without first obtaining a CPT-endorsed I-20 from Bechtel International Center (enrolling in the CPT course alone is insufficient to meet federal immigration regulations).
Terms: Aut, Win, Spr, Sum | Units: 1 | Repeatable for credit
© Stanford University | Terms of Use | Copyright Complaints