CS 5955/6955 Advanced Artificial Intelligence - Spring 2026

Class Overview

This course focuses on advanced algorithms for intelligent sequential decision making with a focus on modern deep learning-based methods. The class will cover both the theory and practical details of the algorithms behind recent breakthroughs in many types of AI decision making, including game playing, robotics, recommendation systems, and large language models. Topics include bandit algorithms, Markov decision processes, partially observable Markov decision processes, reinforcement learning, imitation learning, inverse reinforcement learning, and reinforcement learning from human feedback. This will be a fun, but challenging class. It is an advanced AI class so we will assume a basic understanding of machine learning basics (supervised learning, loss functions, gradient descent) and a basic understanding of AI basics (search problems, MDPs, RL high-level ideas). Note that these topics can be picked up during the class as we will try to keep things self-contained, but we will go over basic topics quickly to get to more advanced materials. Students should be comfortable writing Python code and digging through and understanding code written by others.
Note: Schedule under construction

Class Schedule

Date Topic Slides Readings Assignment
Jan 5 Class Intro Slides
Jan 7 Behavioral Cloning Slides Sections 1-2 of this survey, Behavioral Cloning from Observation HW1: Behavior Cloning in PyTorch (due Jan 14)
Jan 12 Interactive Behavioral Cloning Slides DAgger (you can skim/skip the math), ThriftyDAgger
Jan 14 Advanced Behavior Cloning 1 Slides Implicit Behavioral Cloning, Action Chunking Transformer (just abstract and intro), Diffusion Policy (just abstract and intro) HW2: Advanced BC Homework (due Jan 26)
Jan 19 Holiday
Jan 21 Advanced Behavior Cloning 2 Action Chunking Transformer (full paper), Diffusion Policy (full paper)
Jan 26 Multi-Armed Bandits: Epsilon greedy and UCB1 Sutton and Barto 2.1-2.5, UCB1 HW3: Multi-Armed Bandits (due Feb 2)
Jan 28 Contextual Bandits Overview, EXP4
Feb 2 Q-Learning, SARSA, and DQN Slides Q-Learning, SARSA, Nature DQN Homework 4: Q-Learning and DQN (due Feb 9)
Feb 4 Policy-Gradients and REINFORCE Slides Read Intro Parts 1-3
Feb 9 Actor Critic RL and PPO A3C, PPO
Feb 11 Inverse RL Intro Slides Ng & Russell, Abbeel and Ng HW5: Policy Gradients (due Feb 20)
Feb 16 Holiday
Feb 18 Interactive RL Intro Slides TAMER, DeepTAMER
Feb 23 RLHF Intro Slides Christiano, T-REX
Feb 25 RLHF 2 Slides Learning to Summarize, InstructGPT
Mar 2 Reward Learning from Multiple Feedback Types Slides RRIC, Losey Unifying
Mar 4 Catch up and Project Ideas Slides Final project team selection due March 6.
Mar 7-15 Spring Break
Mar 16 Inverse RL 2 Slides MaxEnt, GuidedCost Final project pitch due on Canvas. Instructions here.
Mar 18 Inverse RL 3 Slides Bayesian IRL, Bayesian Pref Learning
Mar 23 Interactive RL 2 Slides Bayesian IRL, Bayesian Pref Learning
Feb 25 No Class: Do Reading Assignment Instead Final Project Lit Review and Full Proposal due April 4.
Mar 2 Special Topics: Shared Control, Early Failure Detection for Robot Surgery VOSA, Early Failure Detection Pick one paper from the readings for today and submit reading report before class.
Mar 4 Special Topics: RLHF for Robot Surgery, Explainable Reward Learning, Adversarial Attacks on Behavioral Cloning RLHF for Surgery, Reward DDTs, Adversarial Attacks Pick one paper from the readings for today and submit reading report before class.
Mar 18 PPO Slides A3C, PPO
Mar 23 DDPG, TD3, and SAC Slides DDPG, TD3, SAC
Mar 25 Multi-Agent RL Slides MARL book (chapter 5),VDN,QMIX,MAPPO Homework 5 due March 28
Mar 30 Model Based RL Slides World Models, PlaNet Paper or PlaNet Blog, Dreamer Read one of the above papers/posts and submit a reading report before class.
Apr 1 Inverse RL and Reward Learning Slides Final Project Lit Review and Full Proposal due April 4.
Apr 8 LLM Agents and RL Slides
Apr 13 Final project presentations Schedule Apr 14 Shared Slide Deck
Apr 15 Final project presentations Schedule Apr 16 Shared Slide Deck
Apr 20 Final project presentations Schedule Apr 21 Shared Slide Deck
Apr 29 Final project reports due Instructions Use this template (simply go to menu and select copy project).

Additional Resources

Here you can find supplementary materials, links, etc.

PyTorch Tutorials

RL Code Resources

  • Clean RL: single-file implementations of popular RL algos
  • Spinning Up in Deep RL: simple clean implementations of popular RL algos. Hasn't been updated to work with newest version of Gymnasium.
  • Stable Baselines: Extensive codebase for running RL experiments, lots of algorithm implementations.