CS 2281R: Mathematical & Engineering Principles for Training Foundation Models
(aka "How to train your foundation model")

Important info: In order to receive a timely response, please contact the staff via Ed (once we open Ed up). It is the students' responsibility to regularly follow Ed for course announcements. The course will have pre-readings (posted on the website) before each lecture, including for the first lecture.

Instructor: Sham Kakade

TFs: Aayush Karan, Clara Mohri, Han Qi

Lecture time: Thursday 3:45pm - 6:30pm

Lecture location: SEC 1.402

Office hours:

Aayush Karan: Monday 3-4PM in front of SEC 3.302
Clara Mohri: Tuesday 1:30-2:30PM in front of SEC 2.348
Han Qi: Friday 5-6PM at SEC 3.425

Links: Ed Gradescope

The goal of this course is to prepare students to able to understand the principles behind building foundation models, both LLMs and for generative AI. By the end of the course, you should be able to understand the terminology, principles, and current best practices for how these models are trained, from systems-level issues to data collection to mathematical and optimization issues. Ideally, this should help in better guidance for relevant research questions in this space. Towards this end, the expectation is that students will also engage in substantial independent study, including both self-study and peer study. We will have a number of experts from industry giving guest lectures on relevant topics as well.

Prerequisites

This is a rapidly evolving area and it will be a fast-moving course. We aim to cover both technical details and best practices, as well as having a substantive discussion of why certain approaches are taken. The lectures will be a mix of theory and understanding the engineering/systems-level designs, so it will be important for students to have both a strong ML background and also a programming background. Through self/peer study, students are encouraged to rapidly catch up on any relevant material.

You should be familiar with topics such as empirical and population loss, gradient descent, neural networks, linear regression, principal component analysis, etc. On the applied side, you should be comfortable with Python programming and be able to train a neural network.

(tentative) Grading and Course Requirements

The course will have 3 homeworks, largely programming oriented though may have some written/math components. There will also be a course project. All homework and a project must be submitted in order to pass the class. The course is intended for graduate students or advanced undergraduate students who have mostly completed their requirements and are deeply interested in the material. The weighting for grading will be 60% Homework and 40% Project.

In order to pass the course, you must attempt and submit all homework, even if they are submitted for zero credit (as per the late day policy below). Each student will have 96 cumulative hours of late time (as measured on Gradescope), which will be forgiven. After this cumulative amount of time has passed, any assignment that is turned in late will receive 20% off per day. Furthermore, only up to 48 hours of late time may be used on any one assignment.

Course Project

See the Project Page for more information.

Diversity and Inclusiveness

While many academic disciplines have historically been dominated by one cross-section of society, the study of and participation in STEM disciplines is a joy that the instructors hope everyone can pursue, regardless of their socio-economic background, race, gender, etc. We encourage students to both be mindful of these issues and, in good faith, try to take steps to fix them. You are the next generation here. You should expect to be treated by your classmates and the course staff with respect. We subscribe to Harvard's Values for Inclusion.