CS/Stat 184: Introduction to Reinforcement Learning
Modern AI systems often need the ability to make
sequential decisions in an unknown, uncertain, possibly
hostile environment, by actively interacting with the
environment to collect relevant data. Reinforcement
Learning (RL) is a general framework that can capture the
interactive learning setting and has been used to design
intelligent agents that achieve high-level performance in
challenging applications such as Go, computer games,
robotic manipulation, health care, and education.
This course provides an introduction to reinforcement
learning covering a range of problem formulations,
algorithms, and theory. The four main themes of the course
are (1) Markov decision processes (Bellman
equations/optimality, planning, UCB, unknown environments,
linear quadratic control, exploration, imitation
learning), (2) bandits (epsilon-greedy, UCB, Thompson
sampling, contextual bandits, linear bandits, exploration
in MDPs), and (3) deep RL and methods for large-scale
systems (policy gradient methods, Monte Carlo tree search,
Q-learning, imitation learning).
There will also be an Embedded Ethics lecture on
ethical issues arising in reinforcement learning. The
assignments will focus on a mix of algorithmic and
statistical principles, along with their programming
implementations.
After taking this course, students will be able to understand fundamental RL algorithms and their
analysis.
The course will go through algorithms and
their analysis. All homework will have a programming
component to give students more hands-on experience with
the concepts.
|
Staff and Organization
Instructors: Lucas Janson  Sham Kakade
TFs:
Benjamin Schiffer
CAs:
Luke Bailey, Alex Dazhen Cai, Kevin Yee Du, Kevin Yifan Huang, Saket Joshi, Thomas
Kaminsky, Patrick McDonald, Eric Meng
Shen, Natnael Mekuria Teshome, Jeffrey George Wang,
Lecture time: Monday/Wednesday 11:15am - 12:30pm
Lecture location: 114 Western Ave, 2111+2112
Sections (starting 9/11/23):
Mon 5-6pm, SC706
Tue 10:30-11:30am, SC706
Wed 12:45-1:45pm, SEC LL2.221
Thu 11am-12pm, SC706
Fri 2-3pm, SC706
**Please double check the
website for office hour location
changes/cancellations before you arrive.**
Instructor office hours:
Lucas Janson: Thu 10:00-11:00am, SC 710
Sham Kakade: Th 3-4p, SEC 4.410
TAs office hours:
Luke & Thomas: Mon 7-9pm, Dunster DHall
Ben: Tue 11:30am-1:30pm, SC706
Alex & Patrick: Tue 7-9pm, SC706
Kevin D & Kevin H & Eric: Wed 7-9pm, SC706
Natnael & Saket: Fri 3-6pm, SC706
Discussion: Ed discussion board
Contact Info:
Please communicate to the instructors only
by making a post that is “Private”, i.e., “Visible to you and staff only” in Ed.
Any course related email sent directly to the
instructors will not be responded to in a timely manner.
Announcements:
Please make sure you monitor for (and receive) announcements from both the official class mailing
list and from Ed. Ed is a convenient way to send out some announcements, such as homework corrections
and clarifications. It is important for you to make sure you get these announcements in a timely manner.
|
Prerequisites
Lectures will
focus on algorithm design and analysis. We require a
background in: calculus & linear algebra (e.g., AM22a, Math
21b), probability theory (e.g., Stat 110), and programming
in Python. The following topics are recommended but not
required: linear regression, supervised learning,
algorithms.
Homeworks will
have a programming component, and we expect students to be
comfortable with programming in Python (or committed to
quickly learning it). We will use Python as the programming language in all HWs.
|
Grading Policies
(TENTATIVE) Participation 5%; Assignments
45% (HW0: 5%, HW1-HW4: 40% total);
Midterm 20%; Project 30%;
The course is letter-graded by default, but you may
switch to SAT/UNSAT if you prefer.
In order to pass the course, you must
attempt and submit all homework, even if they are
submitted for zero credit (as per the late day policy
below). We will also have an "embedded ethics" lecture
with 1-2 corresponding questions, either incorporated into
a homework or as a standalone short assignment (with the
grading scheme adjusted appropriately). All homeworks are
mathematical and have a programming component (we use
Python and OpenAI
Gym).
Participation: 5% of the grade will
be participation. People can participate in the course in
many different ways, including regular attendance of
lectures (there will be a form after each class where
students can record their attendance), participating in
section, in the Ed forum, and more. At the end of the
term, you will write a paragraph on how you participated
in the course. The requirements to get the full 5%
contribution will not be too onerous, and regularly
attending the lectures will suffice. If for some reason
you are not able to regularly attend all the lectures,
then increased participation in Ed and section will be
sufficient. If you have another responsibility that
prevents you from attending all the lectures, please let
us know by making a post that is “Private”, i.e., “Visible to
you and staff only” in Ed, and we will take this into
consideration.
Homework Policies: Collaboration is permitted
though each student must understand, write, and hand in
their own submission. In particular, it is acceptable for
students to discuss problems with each other; it is not
acceptable for students to look at another student's
written answers. It is also not acceptable to publicly
post your (partial) solution on Ed, but it is
encouraged for you to ask public questions on Ed. You
must also indicate on each homework with whom you
collaborated and what online resources you used.
Each student will have 96 cumulative hours
of late time (as measured on Gradescope), which will
be forgiven. After this cumulative amount of time has
passed, any assignment that is turned in late will receive
zero credit. Furthermore, only up to 48 hours of late time
may be used on any one assignment; any assignment turned
in more than 48 hours late will receive zero credit.
The final homework score for HW1-4 will be
determined by summing up the total points earned across
all four assignments. This sum will then be divided by
the total possible points to calculate the overall
percentage score for the HW1-4 component of the
course.
We highly encourage you to use LaTex. We will also
accept neatly written handwritten homework.
Homeworks must be submitted through Gradescope. PDF
files of the homeworks can be accessed on Gradescope. PDF and LaTeX
files for the homeworks will also be uploaded to Canvas.
Regrading Policy:
All homework regrading requests must be submitted on Gradescope within
seven days after the grades are released. For example,
if we return the grades on Monday, then you have until midnight the
following Monday to submit any regrade requests. If you feel that we
have made an error in grading your homework, please let us know with a
written explanation. This policy is to ensure that we can address any
concerns in a timely and fair manner. The focus of office hours and in
person discussions are solely limited to asking knowledge related
questions. Grade related questions must be submitted by making a post
that is “Private”, i.e., “Visible to you and staff only” in Ed
Project:
Please see the course project page.
|
Diversity and Inclusiveness
While many academic disciplines have historically been dominated by one cross section of society,
the study of and participation in STEM disciplines is a joy that the instructors hope that everyone can
pursue,
regardless of their socio-economic background, race, gender, etc.
We encourage students to both be mindful of these issues, and,
in good faith, try to take steps to fix them. You are the next generation here.
You should expect to be treated by your classmates and the course staff with respect.
You belong here, and we are here to help you learn and enjoy this course.
If any incident occurs that challenges this commitment to a supportive and inclusive environment,
please let the instructors know so that the issue can be addressed. We are personally committed to this
and subscribe to
Harvard's
Values for Inclusion.
|
Honor Code
You must always understand and write up your own solutions.
Collaborations only where explicitly allowed.
Do not use forums like Course Hero, Chegg, etc.
Any outside materials you use for your HWs,
properly cite these references. Do not directly search for answers on the
internet. If you are unclear about
whether some online material can be used, pleasek the course staff first.
No sharing of your solutions within or outside
class at any time.
Do not use Generative AI tools to explicitly
obtain answers. Think of generative AI tools as you would a
message board or collaborator: you can use it for assistance
(and if you do, you should cite it) but you may not directly ask it for the answer.
The above is not an exhaustive list, and in general,
common sense rules about academic integrity apply. If it is
something in doubt, please ask us whether it is OK before you
do it. Also see the Harvard
College Honor Code.
|
Course Materials
Slides will be posted before each lecture, and
annotated slides (with all notes taken on them by the
instructor during lecture) will be posted after each
lecture. We will make reasonable attempts to record and post
each lecture. It possible some lectures may not be recorded,
in which case we will not be able to do any make-ups of that lecture. We
encourage the students to attend the lectures in person (see the Participation Policy) and participate
in the class discussion.
Section materials will also be posted by the TFs. These
materials serve as the reference material for the course
content.
We will often post "Supplementary Reading" for a lecture, but this reading is strictly supplementary:
homework and final exam questions will be based purely on material covered in the lectures and sections.
One source of supplementary material will be from a draft textbook being written for this course.
This material, when available, should closely follow the lecture content, notation,
and structure, with some additional material and examples.
Feedback on this draft is welcome and appreciated via the Ed forum.
More advanced supplementary reading may come from the working draft of
the book "Reinforcement Learning Theory and
Algorithms", available
here.
Note that this is an advanced RL theory book, with much material out of the scope of this class, so we
will only use very select subsections of it as supplemental reading.
If you find typos or errors, please let the authors (e.g., Sham) know--they would appreciate it!
You can also self-study from the classic book "Reinforcement Learning: An Introduction", available here
|
|