CS 5955/6955 Advanced Artificial Intelligence

Class Overview

This course focuses on advanced algorithms for intelligent sequential decision making with a focus on modern deep learning-based methods. The class will cover both the theory and practical details of the algorithms behind recent breakthroughs in many types of AI decision making, including game playing, robotics, recommendation systems, and large language models. Topics include bandit algorithms, Markov decision processes, partially observable Markov decision processes, reinforcement learning, imitation learning, inverse reinforcement learning, and reinforcement learning from human feedback. This will be a fun, but challenging class. It is an advanced AI class so we will assume a basic understanding of machine learning basics (supervised learning, loss functions, gradient descent) and a basic understanding of AI basics (search problems, MDPs, RL high-level ideas). Note that these topics can be picked up during the class as we will try to keep things self-contained, but we will go over basic topics quickly to get to more advanced materials. Students should be comfortable writing Python code and digging through and understanding code written by others.

Class Schedule

Date Topic Slides Readings Assignment
Jan 6 Class Intro Slides Optional: Python Notes (Alan Kuntz) Optional: Python Tutorial (Berkeley AI Class)
Jan 8 Behavioral Cloning Slides Optional: Behavioral Cloning from Observation, DAgger, ThriftyDAgger Behavior Cloning in PyTorch (due Friday Jan 17)
Jan 13 Intro to Advanced Behavior Cloning Slides Choose one and submit reading report before class: Implicit Behavioral Cloning, Action Chunking Transformer, Diffusion Policy
Jan 15 More Advanced Behavior Cloning Slides
Jan 22 Multi-Armed Bandits and Evaluative Feedback Slides Sutton and Barto 2.1-2.5 Multi-Armed Bandits (due Fri Jan 31)
Jan 27 More Bandits Slides
Jan 29 Intro to Markov Decision Processes Slides
Feb 3 Solving MDPs Slides Exact Solution Methods for MDPs Homework 3 (Due Feb 10)
Feb 5 Value-Based RL and Temporal Difference Leanring Slides Intro to RL, TD methods
Feb 10 Q-Learning and DQN Slides Q-Learning, Nature DQN
Feb 12 Intro to Policy-Gradients for RL Slides Read Intro Parts 1-3 Homework 4: Q-Learning and DQN (due Feb 25)
Feb 19 Policy Gradients and REINFORCE
Feb 24 Alpha Go Slides Alpha Go
Feb 26 No Class: Do Reading Assignment Instead Submit reading report by midnight Feb 26 via Canvas on Alpha Go Zero.
Mar 3 Special Topics: Shared Control, Early Failure Detection for Robot Surgery VOSA, Early Failure Detection Pick one paper from the readings for today and submit reading report before class.
Mar 5 Special Topics: RLHF for Robot Surgery, Explainable Reward Learning, Adversarial Attacks on Behavioral Cloning RLHF for Surgery, Reward DDTs, Adversarial Attacks Pick one paper from the readings for today and submit reading report before class.
Mar 10-14 Spring Break Final project team selection due.
Mar 17 Actor Critic Algorithms Slides A3C, PPO Final project pitch due on Canvas. Instructions here.
Mar 19 PPO Slides A3C, PPO
Mar 24 DDPG, TD3, and SAC Slides DDPG, TD3, SAC
Mar 26 Multi-Agent RL Slides MARL book (chapter 5),VDN,QMIX,MAPPO Homework 5 due March 28
Mar 31 Model Based RL Slides World Models, PlaNet Paper or PlaNet Blog, Dreamer Read one of the above papers/posts and submit a reading report before class.
Apr 2 Inverse RL and Reward Learning Slides Final Project Lit Review and Full Proposal due April 4.
Apr 7 RLHF Slides
Apr 9 LLM Agents and RL
Apr 14 Final project presentations Schedule
Apr 16 Final project presentations Schedule
Apr 21 Final project presentations Schedule
Apr 30 Final project reports due Use this template (simply go to menu and select copy project).

Additional Resources

Here you can find supplementary materials, links, etc.

PyTorch Tutorials

RL Code Resources

  • Clean RL: single-file implementations of popular RL algos
  • Spinning Up in Deep RL: simple clean implementations of popular RL algos. Hasn't been updated to work with newest version of Gymnasium.
  • Stable Baselines: Extensive codebase for running RL experiments, lots of algorithm implementations.