CS 2281R: Mathematical & Engineering Principles
for Training Foundation Models
(aka "How to train your foundation model")
Important info:
In order to receive a timely response, please contact the staff via Ed (once we open Ed up). It is the students' responsibility to regularly follow Ed for course announcements. The course will have pre-readings (posted on the website) before each lecture, including for the first lecture.
Instructor: Sham Kakade
TFs:
Aayush Karan, Clara Mohri, Han Qi
Lecture time: Thursday 3:45pm - 6:30pm
Lecture location: SEC 1.402
Office hours:
- Aayush Karan: Monday 3-4PM in front of SEC 3.302
- Clara Mohri: Tuesday 1:30-2:30PM in front of SEC 2.348
- Han Qi: Friday 5-6PM at SEC 3.425
Links: Ed Gradescope
The goal of this course is to prepare students to able
to understand the principles behind building foundation models, both
LLMs and for generative AI. By the end of the course, you should be
able to understand the terminology, principles, and current best
practices for how these models are trained, from systems-level issues
to data collection to mathematical and optimization issues. Ideally, this should
help in better guidance for relevant research questions in this
space. Towards this end, the expectation is that students will
also engage in substantial independent study, including both
self-study and peer study.
We will have a number of experts from industry giving guest
lectures on relevant topics as well.
Prerequisites
This is a rapidly evolving area and it will be a
fast-moving course. We aim to cover both technical details
and best practices, as well as having a substantive
discussion of why certain approaches are taken. The
lectures will be a mix of theory and understanding the
engineering/systems-level designs, so it will be important
for students to have both a strong ML background and also
a programming background. Through self/peer study,
students are encouraged to rapidly catch up on any
relevant material.
You should be familiar with topics such as empirical and population loss, gradient descent, neural networks, linear regression, principal component analysis, etc. On the applied side, you should be comfortable with Python programming and be able to train a neural network.
(tentative) Grading and Course Requirements
The course will have 3 homeworks, largely programming oriented though may have some written/math components. There will also be a course project. All homework and a project must be submitted in order to pass the class. The course is intended for graduate students or advanced undergraduate students who have mostly completed their requirements and are deeply interested in the material. The weighting for grading will be 60% Homework and 40% Project.
In order to pass the course, you must
attempt and submit all homework, even if they are
submitted for zero credit (as per the late day policy
below). Each student will have 96 cumulative hours
of late time (as measured on Gradescope), which will
be forgiven. After this cumulative amount of time has
passed, any assignment that is turned in late will receive
20% off per day. Furthermore, only up to 48 hours of late time
may be used on any one assignment.
Course Project
See the Project Page for more information.
Diversity and Inclusiveness
While many academic disciplines have historically been dominated by one cross-section of society,
the study of and participation in STEM disciplines is a joy that the instructors hope everyone can
pursue,
regardless of their socio-economic background, race, gender, etc.
We encourage students to both be mindful of these issues and,
in good faith, try to take steps to fix them. You are the next generation here.
You should expect to be treated by your classmates and the course staff with respect.
We subscribe to
Harvard's
Values for Inclusion.
Schedule (tentative)
Lecture 1: Thurs, Sept 15
Instructor: Sham Kakade, Nikhil Anand
Topics: Automatic differentiation & checkpointing; some parallelization and compute primitives
Slides: pdf, annotated pdf
Required Reading:
Supplementary Reading:
Lecture 2: Thurs, September 12
Instructor: Sham Kakade, Depen Morwani, Nikhil Vyas
Topics: Optimization
Slides: pdf, annotated pdf
Required Reading:
Supplementary Reading:
Lecture 3: Thurs, September 19
Instructor: Jonathan Frankle,
Databricks
Topics: TBD
Lecture 4: Thurs, September 26
Instructor: Roy Frostig (core JAX developer)
Topics: Sharding and AD
Lecture 5: Thurs, October 3
Instructor: David Brandfonbrener, Sham Kakade
Topics: Pre/mid/post training + Reasoning/o1
Slides: pdf
Supplementary Reading: see hyperlinks in slides
Lecture 6: Thurs, October 10
Instructor: Julia Neagu, CEO, & Deanna Emery, QuotientAI
Topics: Evaluation
Slides: pdf
Supplementary Reading:
Lecture 7: Thurs, October 17
Instructor: Vahan Petrosyan, CEO, and Leo Lindén, PMM, SuperAnnotate
Topics: Annotation
Slides: pdf
Lecture 8: Thurs, October 24
Instructor: Sham Kakade, Nikhil Anand ( + Yasin Mazloumi for slides help)
Topics: Parallelization (FSDP/Zero3, Pipeline, Tensor, Sequential, 3D/4D)
Slides: pdf, annotated pdf
Supplementary Reading:
Lecture 9: Thurs, October 31
Instructor: Yilun Du
Topics: Generative AI, part 1
Slides: pdf
Lecture 10: Thurs, November 7
Instructor: Horace He,
PyTorch
Topics: ML+Systems
Slides: pdf
Lecture 11: Thurs, November 14
Instructor: Michael Albergo
Topics: Generative AI, part 2
Material: Transport Notes
Lecture 12: Thurs, November 21
Instructor: Jacob Austin, Google DeepMind
Topics: Inference & Serving Models
|