CS 2281R: Mathematical & Engineering Principles for Training Foundation Models
(aka "How to train your foundation model")


Important info: In order to receive a timely response, please contact the staff via Ed (once we open Ed up). It is the students' responsibility to regularly follow Ed for course announcements. The course will have pre-readings (posted on the website) before each lecture, including for the first lecture.


Instructor: Sham Kakade

TFs: Aayush Karan, Clara Mohri, Han Qi

Lecture time: Thursday 3:45pm - 6:30pm

Lecture location: SEC 1.402

Office hours:

  • Aayush Karan: Monday 3-4PM in front of SEC 3.302
  • Clara Mohri: Tuesday 1:30-2:30PM in front of SEC 2.348
  • Han Qi: Friday 5-6PM at SEC 3.425

Links: Ed Gradescope


The goal of this course is to prepare students to able to understand the principles behind building foundation models, both LLMs and for generative AI. By the end of the course, you should be able to understand the terminology, principles, and current best practices for how these models are trained, from systems-level issues to data collection to mathematical and optimization issues. Ideally, this should help in better guidance for relevant research questions in this space. Towards this end, the expectation is that students will also engage in substantial independent study, including both self-study and peer study. We will have a number of experts from industry giving guest lectures on relevant topics as well.

Prerequisites

This is a rapidly evolving area and it will be a fast-moving course. We aim to cover both technical details and best practices, as well as having a substantive discussion of why certain approaches are taken. The lectures will be a mix of theory and understanding the engineering/systems-level designs, so it will be important for students to have both a strong ML background and also a programming background. Through self/peer study, students are encouraged to rapidly catch up on any relevant material.

You should be familiar with topics such as empirical and population loss, gradient descent, neural networks, linear regression, principal component analysis, etc. On the applied side, you should be comfortable with Python programming and be able to train a neural network.

(tentative) Grading and Course Requirements

The course will have 3 homeworks, largely programming oriented though may have some written/math components. There will also be a course project. All homework and a project must be submitted in order to pass the class. The course is intended for graduate students or advanced undergraduate students who have mostly completed their requirements and are deeply interested in the material. The weighting for grading will be 60% Homework and 40% Project.

In order to pass the course, you must attempt and submit all homework, even if they are submitted for zero credit (as per the late day policy below). Each student will have 96 cumulative hours of late time (as measured on Gradescope), which will be forgiven. After this cumulative amount of time has passed, any assignment that is turned in late will receive 20% off per day. Furthermore, only up to 48 hours of late time may be used on any one assignment.

Course Project

See the Project Page for more information.

Diversity and Inclusiveness

While many academic disciplines have historically been dominated by one cross-section of society, the study of and participation in STEM disciplines is a joy that the instructors hope everyone can pursue, regardless of their socio-economic background, race, gender, etc. We encourage students to both be mindful of these issues and, in good faith, try to take steps to fix them. You are the next generation here. You should expect to be treated by your classmates and the course staff with respect. We subscribe to Harvard's Values for Inclusion.


Schedule (tentative)

Lecture 1: Thurs, Sept 15

Instructor: Sham Kakade, Nikhil Anand

Topics: Automatic differentiation & checkpointing; some parallelization and compute primitives

Slides: pdf, annotated pdf

Required Reading:

Supplementary Reading:

Lecture 2: Thurs, September 12

Instructor: Sham Kakade, Depen Morwani, Nikhil Vyas

Topics: Optimization

Slides: pdf, annotated pdf

Required Reading:

Supplementary Reading:

Lecture 3: Thurs, September 19

Instructor: Jonathan Frankle, Databricks

Topics: TBD

Lecture 4: Thurs, September 26

Instructor: Roy Frostig (core JAX developer)

Topics: Sharding and AD

Lecture 5: Thurs, October 3

Instructor: David Brandfonbrener, Sham Kakade

Topics: Pre/mid/post training + Reasoning/o1

Slides: pdf

Supplementary Reading: see hyperlinks in slides

Lecture 6: Thurs, October 10

Instructor: Julia Neagu, CEO, & Deanna Emery, QuotientAI

Topics: Evaluation

Slides: pdf

Supplementary Reading:

Lecture 7: Thurs, October 17

Instructor: Vahan Petrosyan, CEO, and Leo Lindén, PMM, SuperAnnotate

Topics: Annotation

Slides: pdf

Lecture 8: Thurs, October 24

Instructor: Sham Kakade, Nikhil Anand ( + Yasin Mazloumi for slides help)

Topics: Parallelization (FSDP/Zero3, Pipeline, Tensor, Sequential, 3D/4D)

Slides: pdf, annotated pdf

Supplementary Reading:

Lecture 9: Thurs, October 31

Instructor: Yilun Du

Topics: Generative AI, part 1

Slides: pdf

Lecture 10: Thurs, November 7

Instructor: Horace He, PyTorch

Topics: ML+Systems

Slides: pdf

Lecture 11: Thurs, November 14

Instructor: Michael Albergo

Topics: Generative AI, part 2

Material: Transport Notes

Lecture 12: Thurs, November 21

Instructor: Jacob Austin, Google DeepMind

Topics: Inference & Serving Models