幫助中心 | 我的帳號 | 關於我們

強化學習的數學原理(英文版)

  • 作者:趙世鈺|責編:郭賽
  • 出版社:清華大學
  • ISBN:9787302658528
  • 出版日期:2024/07/01
  • 裝幀:平裝
  • 頁數:301
人民幣:RMB 118 元      售價:
放入購物車
加入收藏夾

內容大鋼
    本書從強化學習最基本的概念開始介紹,將介紹基礎的分析工具,包括貝爾曼公式和貝爾曼最優公式,然後推廣到基於模型的和無模型的強化學習演算法,最後推廣到基於函數逼近的強化學法。本書強調從數學的角度引入概念、分析問題、分析演算法,並不強調演算法的編程實現。本書不要求讀者具備任何關於強化學習的知識背景,僅要求讀者具備一定的概率論和線性代數的知識。如果讀者已經具備強化學習的學習基礎,本書可以幫助讀者更深入地理解一些問題並提供新的視角。
    本書面向對強化學習感興趣的本科生、研究生、研究人員和企業或研究所的從業者。

作者介紹
趙世鈺|責編:郭賽
    趙世鈺,西湖大學工學院AI分支特聘研究員,智能無人系統實驗室負責人,國家海外高層次人才引進計劃青年項目獲得者;本碩畢業於北京航空航天大學,博士畢業於新加坡國立大學,曾任英國謝菲爾德大學自動控制與系統工程系Lecturer;致力於研發有趣、有用、有挑戰性的下一代機器人系統,重點關注多機器人系統中的控制、決策與感知等問題。

目錄
Overview of this Book
Chapter 1  Basic Concepts
  1.1  A grid world example
  1.2  State and action
  1.3  State transition
  1.4  Policy
  1.5  Reward
  1.6  Trajectories, returns, and episodes
  1.7  Markov decision processes
  1.8  Summary
  1.9  Q&A
Chapter 2  State Values and the Bellman Equation
  2.1  Motivating example 1: Why are returns important?
  2.2  Motivating example 2: How to calculate returns?
  2.3  State values
  2.4  The Bellman equation
  2.5  Examples for illustrating the Bellman equation
  2.6  Matrix-vector form of the Bellman equation
  2.7  Solving state values from the Bellman equation
    2.7.1  Closed-form solution
    2.7.2  Iterative solution
    2.7.3  Illustrative examples
  2.8  From state value to action value
    2.8.1  Illustrative examples
    2.8.2  The Bellman equation in terms of action values
  2.9  Summary
  2.10  Q&A
Chapter 3  Optimal State Values and the Bellman Optimality Equation
  3.1  Motivating example: How to improve policies?
  3.2  Optimal state values and optimal policies
  3.3  The Bellman optimality equation
    3.3.1  Maximization of the right-hand side of the BOE
    3.3.2  Matrix-vector form of the BOE
    3.3.3  Contraction mapping theorem
    3.3.4  Contraction property of the right-hand side of the BOE
  3.4  Solving an optimal policy from the BOE
  3.5  Factors that influence optimal policies
  3.6  Summary
  3.7  Q&A
Chapter 4  Value Iteration and Policy Iteration
  4.1  Value iteration
    4.1.1  Elementwise form and implementation
    4.1.2  Illustrative examples
  4.2  Policy iteration
    4.2.1  Algorithm analysis
    4.2.2  Elementwise form and implementation
    4.2.3  Illustrative examples
  4.3  Truncated policy iteration
    4.3.1  Comparing value iteration and policy iteration
    4.3.2  Truncated policy iteration algorithm

  4.4  Summary
  4.5  Q&A
Chapter 5  Monte Carlo Methods
  5.1  Motivating example: Mean estimation
  5.2  MC Basic: The simplest MC-based algorithm
    5.2.1  Converting policy iteration to be model-free
    5.2.2  The MC Basic algorithm
    5.2.3  Illustrative examples
  5.3  MC Exploring Starts
    5.3.1  Utilizing samples more efficiently
    5.3.2  Updating policies more efficiently
    5.3.3  Algorithm description
  5.4  MC ?-Greedy: Learning without exploring starts
    5.4.1  ?-greedy policies
    5.4.2  Algorithm description
    5.4.3  Illustrative examples
  5.5  Exploration and exploitation of ?-greedy policies
  5.6  Summary
  5.7  Q&A
Chapter 6  Stochastic Approximation
  6.1  Motivating example: Mean estimation
  6.2  Robbins-Monro algorithm
    6.2.1  Convergence properties
    6.2.2  Application to mean estimation
  6.3  Dvoretzky's convergence theorem
    6.3.1  Proof of Dvoretzky's theorem
    6.3.2  Application to mean estimation
    6.3.3  Application to the Robbins-Monro theorem
    6.3.4  An extension of Dvoretzky's theorem
  6.4  Stochastic gradient descent
    6.4.1  Application to mean estimation
    6.4.2  Convergence pattern of SGD
    6.4.3  A deterministic formulation of SGD
    6.4.4  BGD, SGD, and mini-batch GD
    6.4.5  Convergence of SGD
  6.5  Summary
  6.6  Q&A
Chapter 7  Temporal-Difference Methods
  7.1  TD learning of state values
    7.1.1  Algorithm description
    7.1.2  Property analysis
    7.1.3  Convergence analysis
  7.2  TD learning of action values: Sarsa
    7.2.1  Algorithm description
    7.2.2  Optimal policy learning via Sarsa
  7.3  TD learning of action values: n-step Sarsa
  7.4  TD learning of optimal action values: Q-learning
    7.4.1  Algorithm description
    7.4.2  Off-policy vs. on-policy
    7.4.3  Implementation

    7.4.4  Illustrative examples
  7.5  A unifed viewpoint
  7.6  Summary
  7.7  Q&A
Chapter 8  Value Function Approximation
  8.1  Value representation: From table to function
  8.2  TD learning of state values with function approximation
    8.2.1  Objective function
    8.2.2  Optimization algorithms
    8.2.3  Selection of function approximators
    8.2.4  Illustrative examples
    8.2.5  Theoretical analysis
  8.3  TD learning of action values with function approximation
    8.3.1  Sarsa with function approximation
    8.3.2  Q-learning with function approximation
  8.4  Deep Q-learning
    8.4.1  Algorithm description
    8.4.2  Illustrative examples
  8.5  Summary
  8.6  Q&A
Chapter 9  Policy Gradient Methods
  9.1  Policy representation: From table to function
  9.2  Metrics for defining optimal policies
  9.3  Gradients of the metrics
    9.3.1  Derivation of the gradients in the discounted case
    9.3.2  Derivation of the gradients in the undiscounted case
  9.4  Monte Carlo policy gradient (REINFORCE)
  9.5  Summary
  9.6  Q&A
Chapter 10  Actor-Critic Methods
  10.1  The simplest actor-critic algorithm (QAC)
  10.2  Advantage actor-critic (A2C)
    10.2.1  Baseline invariance
    10.2.2  Algorithm description
  10.3  Of-policy actor-critic
    10.3.1  Importance sampling
    10.3.2  The off-policy policy gradient theorem
    10.3.3  Algorithm description
  10.4  Deterministic actor-critic
    10.4.1  The deterministic policy gradient theorem
    10.4.2  Algorithm description
  10.5  Summary
  10.6  Q&A
Appendix A  Preliminaries for Probability Theory
Appendix B  Measure-Theoretic Probability Theory
Appendix C  Convergence of Sequences
  C.1  Convergence of deterministic sequences
  C.2  Convergence of stochastic sequences
Appendix D  Preliminaries for Gradient Descent
Bibliography

Symbols
Index

  • 商品搜索:
  • | 高級搜索
首頁新手上路客服中心關於我們聯絡我們Top↑
Copyrightc 1999~2008 美商天龍國際圖書股份有限公司 臺灣分公司. All rights reserved.
營業地址:臺北市中正區重慶南路一段103號1F 105號1F-2F
讀者服務部電話:02-2381-2033 02-2381-1863 時間:週一-週五 10:00-17:00
 服務信箱:bookuu@69book.com 客戶、意見信箱:cs@69book.com
ICP證:浙B2-20060032