CS 5955/6955 Advanced Artificial Intelligence - Spring 2026

Class Overview

This course focuses on advanced algorithms for intelligent sequential decision making with a focus on modern deep learning-based methods. The class will cover both the theory and practical details of the algorithms behind recent breakthroughs in many types of AI decision making, including game playing, robotics, recommendation systems, and large language models. Topics include bandit algorithms, Markov decision processes, partially observable Markov decision processes, reinforcement learning, imitation learning, inverse reinforcement learning, and reinforcement learning from human feedback. This will be a fun, but challenging class. It is an advanced AI class so we will assume a basic understanding of machine learning basics (supervised learning, loss functions, gradient descent) and a basic understanding of AI basics (search problems, MDPs, RL high-level ideas). Note that these topics can be picked up during the class as we will try to keep things self-contained, but we will go over basic topics quickly to get to more advanced materials. Students should be comfortable writing Python code and digging through and understanding code written by others.

Class Schedule

Date Topic Slides Readings Assignment
Jan 5 Class Intro Slides
Jan 7 Behavioral Cloning Slides Sections 1-2 of this survey, Behavioral Cloning from Observation HW1: Behavior Cloning in PyTorch (due Jan 14)
Jan 12 Interactive Behavioral Cloning Slides DAgger (you can skim/skip the math), ThriftyDAgger
Jan 14 Advanced Behavior Cloning 1 Slides Implicit Behavioral Cloning, Action Chunking Transformer (just abstract and intro), Diffusion Policy (just abstract and intro) HW2: Advanced BC Homework (due Jan 26)
Jan 19 Holiday
Jan 21 Advanced Behavior Cloning 2 Slides Action Chunking Transformer (full paper), Diffusion Policy (full paper)
Jan 26 Multi-Armed Bandits: Epsilon greedy and UCB1 Slides Sutton and Barto 2.1-2.5, UCB1 HW3: Multi-Armed Bandits (due Feb 3)
Jan 28 Contextual Bandits and Intro to RL and MDPs Slides Chapter 3 of Sutton and Barto
Feb 2 Q-Learning, SARSA, and DQN Slides Sections 6.1-6.5, Nature DQN Homework 4: Q-Learning and DQN (due Feb 11)
Feb 4 Policy-Gradients Slides Read Intro Parts 1-3
Feb 9 Actor Critic Algorithms Slides A3C
Feb 11 PPO and GRPO PPO HW5: Policy Gradients (due Feb 20)
Feb 16 Holiday
Feb 18 Inverse RL 1 Abbeel and Ng, MaxEnt IRL
Feb 23 Interactive RL 1 TAMER, DeepTAMER
Feb 25 RLHF 1 Christiano, T-REX HW6: Reward Learning (due Mar 4)
Mar 2 Reward Learning from Multiple Feedback Types RRIC, Learning from Demos, Corrections, and Prefs
Mar 4 Catch up and Project Ideas Final project team selection due March 6.
Mar 7-15 Spring Break
Mar 16 Inverse RL 2 Bayesian IRL, Bayesian Preference Learning Final project pitch due on Canvas. Instructions here.
Mar 18 Inverse RL 3 GAIL, IQ-Learn
Mar 23 Interactive RL 2 Coach, X2T
Mar 25 RLHF 2 Learning to Summarize, InstructGPT
Mar 30 RLHF 3 DPO, Constitutional AI
Apr 1 Advanced RL DDPG, TD3
Apr 6 Advanced RL SAC, FQL
Apr 8 AI Safety and Ethics
Apr 13 Final project presentations
Apr 15 Final project presentations
Apr 20 Final project presentations
Apr 29 Final project reports due Instructions Use this template (simply go to menu and select copy project).

Additional Resources

Here you can find supplementary materials, links, etc.

PyTorch Tutorials

RL Code Resources