目錄
1.AlphaZero, Off-Line Training, and On-Line Play
1.1.Off-Line Training and Policy Iteration
1.2.On-Line Play and Approximation in Value Space-Truncated Rollout
1.3.The Lessons of AlphaZero
1.4.A New Conceptual Framework for Reinforcement Learning
1.5.Notes and Sources
2.Deterministic and Stochastic Dynamic Programming
2.1.Optimal Control Over an Infinite Horizon
2.2.Approximation in Value Space
2.3.Notes and Sources
3.An Abstract View of Reinforcement Learning
3.1.Bellman Operators
3.2.Approximation in Value Space and Newton's Method
3.3.Region of Stability
3.4.Policy Iteration, Rollout, and Newton's Method
3.5.How Sensitive is On-Line Play to the Off-Line Training Process?
3.6.Why Not Just Train a Policy Network and Use it Without On-Line Play?
3.7.Multiagent Problems and Multiagent Rollout
3.8.On-Line Simplified Policy Iteration
3.9.Exceptional Cases
3.10.Notes and Sources
4.The Linear Quadratic Case - Illustrations
4.1.Optimal Solution
4.2.Cost Functions of Stable Linear Policies
4.3.Value Iteration
4.4.One-Step and Multistep Lookahead - Newton Step Interpretations
4.5.Sensitivity Issues
4.6.Rollout and Policy Iteration
4.7.Truncated Rollout - Length of Lookahead Issues
4.8.Exceptional Behavior in Linear Quadratic Problems
4.9.Notes and Sources
5.Adaptive and Model Predictive Control
5.1.Systems with Unknown Parameters - Robust and PID Control
5.2.Approximation in Value Space, Rollout, and Adaptive Control
5.3.Approximation in Value Space, Rollout, and Model Predictive Control
5.4.Terminal Cost Approximation - Stability Issues
5.5.Notes and Sources
6.Finite Horizon Deterministic Problems - Discrete Optimization
6.1.Deterministic Discrete Spaces Finite Horizon Problems.
6.2.General Discrete Optimization Problems
6.3.Approximation in Value Space
6.4.Rollout Algorithms for Discrete Optimization .. .
6.5.Rollout and Approximation in Value Space with Multistep Lookahead
6.5.1.Simplified Multistep Rollout - Double Rollout..p.
6.5.2.Incremental Rollout for Multistep Approximation in Value Space
6.6.Constrained Forms of Rollout Algorithms
6.7.Adaptive Control by Rollout with a POMDP Formulation
6.8.Rollout for Minimax Control
6.9.Small Stage Costs and Long Horizon - Continuous-Time Rollout
6.10.Epilogue
Appendix A: Newton's Method and Error Bounds
A.1.Newton's Method for Differentiable Fixed Point Problems
A.2.Newton's Method Without Differentiability of the Hellman Operator
A.3.Local and Global Error Bounds for Approximation in Value Space
A.4.Local and Global Error Bounds for Approximate Policy Iteration
References