CS 5955/6955 Advanced Artificial Intelligence

Class Overview

This course focuses on advanced algorithms for intelligent sequential decision making with a focus on modern deep learning-based methods. The class will cover both the theory and practical details of the algorithms behind recent breakthroughs in many types of AI decision making, including game playing, robotics, recommendation systems, and large language models. Topics include bandit algorithms, Markov decision processes, partially observable Markov decision processes, reinforcement learning, imitation learning, inverse reinforcement learning, and reinforcement learning from human feedback. This will be a fun, but challenging class. It is an advanced AI class so we will assume a basic understanding of machine learning basics (supervised learning, loss functions, gradient descent) and a basic understanding of AI basics (search problems, MDPs, RL high-level ideas). Note that these topics can be picked up during the class as we will try to keep things self-contained, but we will go over basic topics quickly to get to more advanced materials. Students should be comfortable writing Python code and digging through and understanding code written by others.

Class Schedule

Date Topic Slides Readings Assignment
Jan 6 Class Intro Slides Optional: Python Notes (Alan Kuntz) Optional: Python Tutorial (Berkeley AI Class)
Jan 8 Behavioral Cloning Slides Optional: Behavioral Cloning from Observation, DAgger, ThriftyDAgger Behavior Cloning in PyTorch (due Friday Jan 17)
Jan 13 Intro to Advanced Behavior Cloning Slides Choose one and submit reading report before class: Implicit Behavioral Cloning, Action Chunking Transformer, Diffusion Policy
Jan 15 More Advanced Behavior Cloning Slides
Jan 22 Multi-Armed Bandits and Evaluative Feedback Slides Sutton and Barto 2.1-2.5 Multi-Armed Bandits (due Fri Jan 31)
Jan 27 More Bandits Slides
Jan 29 Intro to Markov Decision Processes Slides
Feb 3 Solving MDPs Slides Exact Solution Methods for MDPs Homework 3 (Due Feb 10)
Feb 5 Value-Based RL and Temporal Difference Leanring Slides Intro to RL, TD methods
Feb 10 Q-Learning and DQN Slides Q-Learning, Nature DQN
Feb 12 Intro to Policy-Gradients for RL Slides Read Intro Parts 1-3 Homework 4: Q-Learning and DQN (due Feb 25)
Feb 19 Policy Gradients and REINFORCE
Feb 24 Alpha Go Slides Alpha Go
Feb 26 No Class: Do Reading Assignment Instead Submit reading report by midnight Feb 26 via Canvas on Alpha Go Zero.
Mar 3 Special Topics: Shared Control, Early Failure Detection for Robot Surgery VOSA, Early Failure Detection Pick one paper from the readings for today and submit reading report before class.
Mar 5 Special Topics: RLHF for Robot Surgery, Explainable Reward Learning, Adversarial Attacks on Behavioral Cloning RLHF for Surgery, Reward DDTs, Adversarial Attacks Pick one paper from the readings for today and submit reading report before class.
Mar 10-14 Spring Break Final project team selection due.
Mar 17 Actor Critic Algorithms and PPO Final project pitch due on Canvas. Instructions here.
Mar 19 DDPG and SAC
Mar 24 Model-Based RL
Mar 26 Multi-Agent RL
Mar 31 Offline RL
Apr 2 RLHF
Apr 7 Inverse RL
Apr 14 More Inverse RL
Apr 14 LLM Agents and RL
Apr 16 Final project presentations
Apr 21 Final project presentations
Apr 30 Final project reports due

Additional Resources

Here you can find supplementary materials, links, etc.

PyTorch Tutorials