| Date | Topic | Slides | Readings | Assignment | |
|---|---|---|---|---|---|
| Jan 5 | Class Intro | Slides | |||
| Jan 7 | Behavioral Cloning | Slides | Sections 1-2 of this survey, Behavioral Cloning from Observation | HW1: Behavior Cloning in PyTorch (due Jan 14) | |
| Jan 12 | Interactive Behavioral Cloning | Slides | DAgger (you can skim/skip the math), ThriftyDAgger | ||
| Jan 14 | Advanced Behavior Cloning 1 | Slides | Implicit Behavioral Cloning, Action Chunking Transformer (just abstract and intro), Diffusion Policy (just abstract and intro) | HW2: Advanced BC Homework (due Jan 26) | |
| Jan 19 | Holiday | ||||
| Jan 21 | Advanced Behavior Cloning 2 | Slides | Action Chunking Transformer (full paper), Diffusion Policy (full paper) | ||
| Jan 26 | Multi-Armed Bandits: Epsilon greedy and UCB1 | Slides | Sutton and Barto 2.1-2.5, UCB1 | HW3: Multi-Armed Bandits (due Feb 3) | |
| Jan 28 | Contextual Bandits and Intro to RL and MDPs | Slides | Chapter 3 of Sutton and Barto | ||
| Feb 2 | Q-Learning, SARSA, and DQN | Slides | Sections 6.1-6.5, Nature DQN | Homework 4: Q-Learning and DQN (due Feb 11) | |
| Feb 4 | Policy-Gradients | Slides | Read Intro Parts 1-3 | ||
| Feb 9 | Actor Critic Algorithms | Slides | A3C | ||
| Feb 11 | PPO and GRPO | PPO | HW5: Policy Gradients (due Feb 20) | ||
| Feb 16 | Holiday | ||||
| Feb 18 | Inverse RL 1 | Abbeel and Ng, MaxEnt IRL | |||
| Feb 23 | Interactive RL 1 | TAMER, DeepTAMER | |||
| Feb 25 | RLHF 1 | Christiano, T-REX | HW6: Reward Learning (due Mar 4) | ||
| Mar 2 | Reward Learning from Multiple Feedback Types | RRIC, Learning from Demos, Corrections, and Prefs | |||
| Mar 4 | Catch up and Project Ideas | Final project team selection due March 6. | |||
| Mar 7-15 | Spring Break | ||||
| Mar 16 | Inverse RL 2 | Bayesian IRL, Bayesian Preference Learning | Final project pitch due on Canvas. Instructions here. | ||
| Mar 18 | Inverse RL 3 | GAIL, IQ-Learn | |||
| Mar 23 | Interactive RL 2 | Coach, X2T | |||
| Mar 25 | RLHF 2 | Learning to Summarize, InstructGPT | |||
| Mar 30 | RLHF 3 | DPO, Constitutional AI | |||
| Apr 1 | Advanced RL | DDPG, TD3 | |||
| Apr 6 | Advanced RL | SAC, FQL | |||
| Apr 8 | AI Safety and Ethics | ||||
| Apr 13 | Final project presentations | ||||
| Apr 15 | Final project presentations | ||||
| Apr 20 | Final project presentations | ||||
| Apr 29 | Final project reports due | Instructions | Use this template (simply go to menu and select copy project). |
Here you can find supplementary materials, links, etc.
PyTorch Tutorials
RL Code Resources