CS/Stat 184: Introduction to Reinforcement Learning
information: This year,
because it is a new course, CS/Stat 184 will have its enrollment
capped at 120. We will respond to all petitions the morning of
August 26, so in order to ensure consideration for a spot in the
course, submit an enrollment petition (it can be blank) before
August 26. On August 26, we will approve petitions in the following
order: (1) any senior undergraduate, (2) any other undergraduate who
has a concentration or secondary in CS or Statistics, in decreasing
order of seniority, (3) any other undergraduates, in decreasing
order of seniority, (4) all graduate students (note this is an
undergraduate class, hence priority is given to undergraduates). Any
ties will be broken randomly among petitions with equal priority. If
your petition is not approved on August 26, you may still be able to
take the class; if spots open up as students drop, we will enroll
further students in the same order as above.
Modern Artificial Intelligent (AI) systems often need the ability to make sequential decisions in an
uncertain, possibly hostile environment, by actively interacting with the environment to collect relevant
Reinforcement Learning (RL) is a general framework that can capture the interactive learning setting and
has been used to design intelligent agents that achieve super-human level performance on
challenging tasks such as Go, computer games, and robotics manipulation.
This course focuses on basics of Reinforcement Learning. The four main parts of the course are
(1) multi-armed bandits, (2) Planning and
Control in MDPs, (3) Learning in Large MDPs (function approximation), and (4)
After taking this course, students will be able to understand fundamental RL algorithms and their
All lectures will be math heavy. We will go
through algorithms and their analysis. All homework will have
a programming component to give students more hands-on
experience with the concepts.
Staff and Organization
Instructors: Lucas Janson  Sham Kakade
Daniel Garces, Kuanhao Jiang, Yanke Song
Alex Dazhen Cai, Howie Guo, Angela Yilin Li, Richard Qiu, Eric Meng
Shen, Lara Zeng, Saba Zerefa
Lecture time: Tuesday/Thursday 10:30am - 11:45am
Lecture location: Maxwell Dworkin G115
**Please double check the
website for office hour location
changes/cancellations before you arrive.**
Instructor office hours:
Lucas Janson: Mon 1:30-2:30pm, SC 710
Sham Kakade: Thu 4-5pm, SEC 4.410
TAs office hours:
Alex, Howie & Richard: Thu 3-5pm, Sever Hall 110
Angela & Eric: Sun 2-4pm, Quincy Dhall
Daniel: Fri 11am-noon, SEC 1.414
Kuanhao: Tue & Thu 2:30-3:30pm, Sever Hall 214
Lara & Saba: Tue 8:30-10:30pm, Lowell Dhall
Yanke: Mon & Wes 5:30-6:30pm, SC 222
Discussion: Ed discussion board
Please communicate to the instructors only
by making a private post to the "instructors" in ED.
Any course related email sent directly to the
instructors will not be responded to in a timely manner.
Please make sure you monitor for (and receive) announcements from both the official class mailing
list and from Ed. Ed is a convenient way to send out some announcements, such as homework corrections
and clarifications. It is important for you to make sure you get these announcements in a timely manner.
Lectures will be mathematically oriented, where we
focus on algorithm design and analysis. We require a
background in: calculus & linear algebra (e.g., AM22a, Math
21b), probability theory (e.g., Stat 110), and programming
in Python. The following topics are recommended but not
required: linear regression, supervised learning,
have a programming component, and we expect students to be
comfortable with programming in Python (or committed to
quickly learning it). We will use Python as the programming language in all HWs.
Assignments 70% (HW0: 6%, HW1-HW4: 16%
each); Final 30%
is letter-graded by default, but you may switch to SAT/UNSAT if you prefer.
In order to pass the course, you must attempt and
submit all homeworks, even if they are submitted for zero
credit (as per the late day policy below).
We will also have an "embedded ethics" lecture with
1-2 corresponding questions, either incorporated into a homework or as a
standalone short assignment (with the grading scheme adjusted appropriately).
All homeworks are mathematical and have a
programming component (we use Python and OpenAI Gym). The final
exam covers concepts and algorithms,
and it does not contain a programming component.
Homework Policies: Collaboration is permitted though each
student must understand, write, and hand in their own
submission. In particular, it is acceptable for students to
discuss problems with each other; it is not acceptable for
students to look at another student's written answers. It is
also not acceptable to publicly post your (partial)
solution on Ed, but it is is encouraged for you to ask
public questions on Ed.
You must also indicate on each homework with whom you
collaborated and what online resources you used.
Each student will have 96 cumulative hours of late time (as
measured on Canvas), which will be forgiven. After this
cumulative amount of time has passed, any assignment that is
turned in late will receive zero credit. Furthermore,
only up to 48 hours of late time may be used on any one
assignment; any assignment turned in more than 48 hours
late will receive zero credit.
We highly encourage you to use LaTex. We will also accept
neatly written handwritten homework.
Homeworks must be submitted through Gradescope. PDF files of the homeworks can be
accessed on Gradescope. PDF and LaTeX files for the homeworks will also be uploaded to Canvas.
All homework regrading requests must be submitted on Gradescope within
seven days after the grades are released. For example,
if we return the grades on Monday, then you have until midnight the
following Monday to submit any regrade requests. If you feel that we
have made an error in grading your homework, please let us know with a
written explanation. This policy is to ensure that we can address any
concerns in a timely and fair manner. The focus of office hours and in
person discussions are solely limited to asking knowledge related
questions. Grade related questions must be submitted to the
course mailing list.
If you are not able to make the final exam on the official date (and
do not have an exception based on university policies), then please do not
enroll in the course. The course is in Exam Group FAS14_B.
Diversity and Inclusiveness
While many academic disciplines have historically been dominated by one cross section of society,
the study of and participation in STEM disciplines is a joy that the instructors hope that everyone can
regardless of their socio-economic background, race, gender, etc.
We encourage students to both be mindful of these issues, and,
in good faith, try to take steps to fix them. You are the next generation here.
You should expect to be treated by your classmates and the course staff with respect.
You belong here, and we are here to help you learn and enjoy this course.
If any incident occurs that challenges this commitment to a supportive and inclusive environment,
please let the instructors know so that the issue can be addressed. We are personally committed to this
and subscribe to
Values for Inclusion.
Collaborations only where explicitly allowed.
Do not use forums like Course Hero, Chegg, etc.
Any outside materials you use for your HWs,
properly cite these references. If you are unclear about
whether some online material can be used, please ask the course staff first.
No sharing of your solutions within or outside class at any time.
The above is not an exhaustive list, and in general,
common sense rules about academic integrity apply. If it is
something in doubt, please ask us whether it is OK before you
do it. Also see the Harvard
College Honor Code.
Slides will be posted before each lecture, and
annotated slides (with all notes taken on them by the
instructor during lecture) will be posted after each
lecture. We will make reasonable attempts to record and post
each lecture. It possible some lectures may not be recorded,
in which case we will not be able to do any make-ups of that lecture. We
encourage the students to attend the lectures in person and participate
in the class discussion.
Section materials will also be posted by the TFs. These
materials serve as the reference material for the course
We will often post "Supplementary Reading" for a lecture, but this reading is strictly supplementary:
homework and final exam questions will be based purely on material covered in the lectures and sections.
The supplementary reading will sometimes come from the working draft of
the book "Reinforcement Learning Theory and
Note that this is an advanced RL theory book, with much material out of the scope of this class, so we
will only use very select subsections of it as supplemental reading.
If you find typos or errors, please let the authors (e.g., Sham) know--they would appreciate it!
You can also self-study from the classic book "Reinforcement Learning: An Introduction", available here