How can a computer make decisions? This course will explore the theory and algorithms behind automated decision making. We will address a number of different settings of the problem: planning sequences of actions when the world behaves in a known and deterministic way, constructing a policy that specifies reactions to different situations when the world behavioes probabilistically, and learning a policy or model when the the way the world behaves is unknown in advance. We will consider applications in areas such as robot motion planning, transportation scheduling, and computer games.
Date | Subject | Reading | Notes | Assignment |
---|---|---|---|---|
9/9 | Introduction; discrete spaces, atomic representation | AIMA3E 2, 4.1 (AIMA2E 2, 4.3); Simulated Annealing | L1 | Proj 0 |
9/14 | Discrete spaces, factored representation | AIMA3E 6 (AIMA2E 5) | L2 | |
9/16 | Continuous spaces | AIMA3E 4.2 (AIMA2E 4.4); Conjugate Gradient | L3 | Proj 1 |
9/21 | Path search discrete atomic spaces: review of dynamic programming, A* | AIMA3E 3; PA 2.2, 2.3 | L4 | |
9/23 | Path search continuous spaces: configuration space and exact methods | AIMA3E 25.4; PA 4.2, 4.3, 6 | L56 | |
9/28 | Path search continuous spaces: sampling-based approaches | PA 5 | ||
9/30 | Logic: factored and relational representations of big discrete spaces | AIMA3E 7, 8 | PS1 solutions | |
10/5 | Planning: situation calculus and PDDL | AIMA3E 10.1, 10.2 (AIMA2E 11.1, 11.2); PA 2.4, 2.5 | Proj 2 | |
10/7 | Planning: progression, regression, graphplan | AIMA3E 10.3 - 10.6 (AIMA2E 11.3 - 11.7) | PS2 solutions | |
10/12 | Planning: Other topics | AIMA3E 11 (AIMA2E 12) | ||
10/14 | Exam | |||
10/19 | Uncertainty about outcomes: lotteries, decision trees | AIMA3E 16.1 - 16.5 | ||
10/21 | Markov decision processes | AIMA3E 17.1 - 17.2 | ||
10/26 | More MDPs, plus factored discrete representation | AIMA3E 17.1 - 17.2; Papers | ||
10/28 | Reinforcement learning | AIMA3E21, Sutton and Barto | Proj 3 | |
11/2 | Function and policy approximation in RL | Non-linear TD, Linear Q, Policy gradient, PSDP | ||
11/4 | Uncertainty about state: value of information | AIMA3E 16.6, 4.4, 17.4 (AIMA2E 16.6, 3.6, 17.4) | ||
11/9 | POMDPs | LPK paper, Smith thesis, AIMA3E 15 | ||
11/11 | Holiday | |||
11/16 | POMDP algorithms | |||
11/18 | More POMDP algorithms | Papers | ||
11/23 | Multi-agent decision making: Frans Oliehoek | slides | PS3 (revised) | |
11/25 | Holiday | |||
11/30 | Search and learning in games: David Silver | PS3 solutions | ||
12/2 | Exam | |||
12/7 | Final project presentations | |||
12/9 | Final project presentations |