STATS 354: Generalization and Causality in Biohealth (CS 273D)
While modern machine learning models often achieve superhuman performance on biohealth benchmarks, they frequently fail to generalize to new hospitals, patient populations, or biological contexts. This course investigates the theoretical and practical foundations of generalizable inference in biomedicine, focusing on the critical gap between predictive performance and mechanistic validity. We will examine how to build "world models" that leverage biological structure, enabling generalization beyond the training distribution. Key topics include: inductive biases (biologically relevant priors), causal representation learning (discovering latent state variables), hybrid models (combining mechanistic ODEs with neural networks), learning from interventional data (spanning high-throughput perturbation screens to policy learning from clinical interventions), and causal transportability. Students will engage with cutting-edge literature, dissecting success stories and analyzing "failure modes" where black-box models fall short of clinical reality.
Terms: Spr
| Units: 1-3
Instructors:
Fox, E. (PI)
STATS 357: Reliability and Validity in Artificial Intelligence (MS&E 330)
This course examines the principles and methods required to make artificial intelligence (AI) systems reliable and scientifically sound. Topics include evaluation and benchmarking, notions of validity, distribution shift, causality, predictive inference, AI-assisted statistical inference, data attribution, and beyond. Problem sets will involve both mathematical components and coding projects to see the practical effects of the methods we develop.
Terms: Spr
| Units: 3
Instructors:
Zrnic, T. (PI)
STATS 361: Causal Inference
This course covers statistical underpinnings of causal inference, with a focus on experimental design and data-driven decision making. Topics include randomization, potential outcomes, observational studies, propensity score methods, matching, double robustness, semiparametric efficiency, treatment heterogeneity, structural models, instrumental variables, principal stratification, mediation, regression discontinuities, synthetic controls, interference, sensitivity analysis, policy learning, dynamic treatment rules, invariant prediction, graphical models, and structure learning. We will also discuss the relevance of optimization and machine learning tools to causal inference. Prerequisite:
STATS 300A and
STATS 300B, or equivalent graduate-level coursework on the theory of statistics.
Terms: Spr
| Units: 3
Instructors:
Rothenhaeusler, D. (PI)
STATS 362: Topic: Monte Carlo
Random numbers and vectors: inversion, acceptance-rejection, copulas. Variance reduction: antithetics, stratification, control variates, importance sampling. MCMC: Markov chains, detailed balance, Metropolis-Hastings, random walk Metropolis,nnindependence sampler, Gibbs sampling, slice sampler, hybrids of Gibbs and Metropolis, tempering. Sequential Monte Carlo. Quasi-Monte Carlo. Randomized quasi-Monte Carlo. Examples, problems and motivation from Bayesian statistics,nnmachine learning, computational finance and graphics. May be repeat for credit.
Terms: Spr
| Units: 3
Instructors:
Owen, A. (PI)
STATS 363: Design of Experiments (STATS 263)
Experiments vs observation. Confounding. Randomization. ANOVA. Blocking. Latin squares. Factorials and fractional factorials. Split plot. Response surfaces. Mixture designs. Optimal design. Central composite. Box-Behnken. Taguchi methods. Computer experiments and space filling designs. Prerequisites:
STATS 116/118,
STATS 191/203. See
https://statistics.stanford.edu/course-equiv for equivalent courses in other departments that satisfy these prerequisites.
Last offered: Winter 2025
| Units: 3
STATS 365: Empirical Likelihood
Empirical likelihood (EL) allows likelihood based inferences without assuming any parametric form for the likelihood. It is based instead on reweighting the sample values. It provides data driven shapes for confidence regions and confidence bands. EL tests have competitive power. EL has recently been used in causal inference, reinforcement learning and distributionally robust inference. This course covers: nonparametric maximum likelihood and likelihood ratios, censoring and truncation, biased sampling, estimating equations, GMM, Bayesian bootstrap, Euclidean and Kullback-Leibler log likelihoods and recent research directions.
Last offered: Spring 2023
| Units: 3
STATS 369: Methods from Statistical Physics
Mathematical techniques from statistical physics have been applied with increasing success on problems form combinatorics, computer science, machine learning. These methods are non-rigorous, but in several cases they were proved to yield correct predictions. This course provides a working knowledge of these methods for non-physicists. Specific topic: Lattice Gauge Theories.
Terms: Aut
| Units: 3
Instructors:
Chatterjee, S. (PI)
STATS 370: Bayesian Statistics (STATS 270)
This course will treat Bayesian statistics at a relatively advanced level. Assuming familiarity with standard probability and multivariate distribution theory, we will provide a discussion of the mathematical and theoretical foundation for Bayesian inferential procedures. In particular, we will examine the construction of priors and the asymptotic properties of likelihoods and posterior distributions. The discussion will include but will not be limited to the case of finite dimensional parameter space. There will also be some discussions on the computational algorithms useful for Bayesian inference. Prerequisites:
Stats 116 or equivalent probability course, plus basic programming knowledge; basic calculus, analysis and linear algebra strongly recommended;
Stats 200 or equivalent statistical theory course desirable.
Terms: Spr
| Units: 3
Instructors:
Wong, W. (PI)
STATS 371: Applied Bayesian Statistics (STATS 271)
This course is a modern treatment of applied Bayesian statistics with a focus on high-dimensional problems. We will study a collection of canonical methods that see heavy use in applications, including high-dimensional linear and generalized linear models, hierarchical/random effects models, Gaussian processes, variable-dimension and Dirichlet process mixtures, graphical models, and methods used in Bayesian inverse problems. Each method will be accompanied by one or more motivating datasets. Through these examples the course will cover: (1) Bayesian hypothesis testing, multiplicity correction, selection, shrinkage, and model averaging; (2) prior choice; (3) Frequentist properties of Bayesian procedures in high dimensions; and (4) computation by Markov chain Monte Carlo, including constructing efficient Gibbs, Metropolis, and more exotic samplers, empirical convergence analysis, strategies for scaling computation to high dimensions (approximations, divide-and-conquer, minibatching, et cetera), and the theory of convergence rates.
Last offered: Spring 2025
| Units: 3
STATS 375: Mathematical Problems in Machine Learning (MATH 276)
Mathematical tools to understand modern machine learning systems. Generalization in machine learning, the classical view: uniform convergence, Radamacher complexity. Generalization from stability. Implicit (algorithmic) regularization. Infinite-dimensional models: reproducing kernel Hilbert spaces. Random features approximations to kernel methods. Connections to neural networks, and neural tangent kernel. Nonparametric regression. Asymptotic behavior of wide neural networks. Properties of convolutionalnetworks. Prerequisites: EE364A or equivalent; Stat310A or equivalent. NOTE: Undergraduates require instructor permission to enroll. Undergraduates interested in taking the course should contact the instructor for permission, providing information about relevant background such as performance in prior coursework, reading, etc.
Last offered: Spring 2024
| Units: 3
