Sham M. Kakade

Research:

I work on advancing the fundamental capabilities needed to develop artificial general intelligence and create systems that can effectively interact with and add value to the real world. My current interests include:

(i) developing full-stack training pipelines for foundation models, with particular focus on distributed systems architecture, scalable optimization algorithms, and principled approaches to data curation and composition.
(ii) investigating the mathematical and scientific principles that govern large-scale learning systems, with emphasis on understanding emergent capabilities, scaling laws, and fundamental limits of neural architectures.
(iii) advancing autonomous agent architectures that can reason, plan, and learn from interaction, with focus on bridging the gap between language models and embodied intelligence in complex environments.

Prospective Students:

I am seeking students with diverse backgrounds, including those experienced in applied deep learning, as well as those with strong foundations in optimization, mathematics, and theoretical computer science. As co-director of the newly-established [Kempner Institute], we offer substantial computational resources for cutting-edge research. If you're interested, I encourage you to apply to Harvard!

Recent Blog Posts:

Selected blog posts (also see [publications] and [Deeper Learning]) exploring fundamental questions in AI and advancing technical innovations:

[How Does Critical Batch Size Scale in Pre-training?]

[Mixture of Parrots: Experts Improve Memorization More Than Reasoning]

[Transcendence: Generative Models Can Outperform the Experts That Train Them]

[Repeat After Me: Transformers are Better than State Space Models at Copying]

Selected Service:

[Committee for the ACM Prize in Computing] (active)
[Committee for the Sloan Research Fellowships] in Computer Science
Co-organizer for the Simons Symposium on [New Directions in Theoretical Machine Learning], May 2019
Program chair for the 24th Annual Conference on Learning Theory (COLT 2011)

Other Good Stuff:

[What is the Value of Human-Level AI to Education?]