内容简介
《非线性系统自学习最优控制:自适应动态规划方法(英文版)》presents a class of novel, self-learning, optimal control schemes based on adaptive dynamic programming techniques, which quantitatively obtain the optimal control schemes of the systems. It analyzes the properties identified by the programming methods, including the convergence of the iterative value functions and the stability of the system under iterative control laws, helping to guarantee the effectiveness of the methods developed. When the system model is known, self-learning optimal control is designed on the basis of the system model; when the system model is not known, adaptive dynamic programming is implemented according to the system data, effectively making the performance of the system converge to the optimum.
With various real-world examples to complement and substantiate the mathematical analysis, the book is a valuable guide for engineers, researchers, and students in control science and engineering.
内页插图
目录
Contents
1 Principle of Adaptive Dynamic Programming 1
1.1 Dynamic Programming 1
1.1.1 Discrete-Time Systems 1
1.1.2 Continuous-Time Systems 2
1.2 Original Forms of Adaptive Dynamic Programming 3
1.2.1 Principle of Adaptive Dynamic Programming 4
1.3 Iterative Forms of Adaptive Dynamic Programming 9
1.3.1 Value Iteration 9
1.3.2 Policy Iteration 10
1.4 About This Book 11
References 14
2 An Iterative *-Optimal Control Scheme for a Class of Discrete-Time Nonlinear Systems with Unfixed Initial State 19
2.1 Introduction 19
2.2 Problem Statement 20
2.3 Properties of the Iterative Adaptive Dynamic Programming Algorithm 21
2.3.1 Derivation of the Iterative ADP Algorithm 21
2.3.2 Properties of the Iterative ADP Algorithm 23
2.4 The *-Optimal Control Algorithm 28
2.4.1 The Derivation of the *-Optimal Control Algorithm 28
2.4.2 Properties of the *-Optimal Control Algorithm 32
2.4.3 The *-Optimal Control Algorithm for Unfixed Initial State 34
2.4.4 The Expressions of the *-Optimal Control Algorithm 37
2.5 Neural Network Implementation for the *-Optimal Control Scheme 37
2.5.1 The Critic Network 38
2.5.2 The Action Network 39
2.6 Simulation Study 40
2.7 Conclusions 42
References 43
3 Discrete-Time Optimal Control of Nonlinear Systems via Value Iteration-Based Q-Learning 47
3.1 Introduction 47
3.2 Preliminaries and Assumptions 49
3.2.1 Problem Formulations 49
3.2.2 Derivation of the Discrete-Time Q-Learning Algorithm 50
3.3 Properties of the Discrete-Time Q-Learning Algorithm 52
3.3.1 Non-Discount Case 52
3.3.2 Discount Case 59
3.4 Neural Network Implementation for the Discrete-Time Q-Learning Algorithm 64
3.4.1 The Action Network 65
3.4.2 The Critic Network 67
3.4.3 Training Phase 69
3.5 Simulation Study 70
3.5.1 Example 1 70
3.5.2 Example 2 76
3.6 Conclusion 81
References 82
4 A Novel Policy Iteration-Based Deterministic Q-Learning for Discrete-Time Nonlinear Systems 85
4.1 Introduction 85
4.2 Problem Formulation 86
4.3 Policy Iteration-Based Deterministic Q-Learning Algorithm for Discrete-Time Nonlinear Systems 87
4.3.1 Derivation of the Policy Iteration-Based Deterministic Q-Learning Algorithm 87
4.3.2 Properties of the Policy Iteration-Based Deterministic Q-Learning Algorithm 89
4.4 Neural Network Implementation for the Policy Iteration-Based Deterministic Q-Learning Algorithm 93
4.4.1 The Critic Network 93
4.4.2 The Action Network 95
4.4.3 Summary of the Policy Iteration-Based Deterministic Q-Learning Algorithm 96
4.5 Simulation Study 97
4.5.1 Example 1 97
4.5.2 Example 2 100
4.6 Conclusion 107
References 107
5 Nonlinear Neuro-Optimal Tracking Control via Stable Iterative Q-Learning AIgorithm 111
5.1 lntroduction 111
5.2 Problem Statement 112
5.3 Policy Iteration Q-Leaming Algotithm for Optimal Tracking Control 114
5.4 Properties of the Policy Iteration Q-Learning Algorithm 114
5.5 Neural Network Implementation for the Policy Iteration Q-Leaming Algorithm 119
5.5.1 The Critic Network 120
5.5.2 The Action Network 120
5.6 Simulation Study 121
5.6.1 Example 1 122
5.6.2 Example 2 125
5.7 Conclusions 129
References 129
6 Model-Free Multiobjective Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems with General Performance Index Functions 133
6.1 Introduction 133
6.2 Preliminaries 134
6.3 Multiobjective Adaptive Dynamic Programming Method 135
6.4 Model-Free Incremental Q-Leaming Method 145
6.5 Neural Network Implementation for the Incremental Q-Learning Method 147
6.5.1 The Critic Network 148
6.5.2 The Action Network 149
6.5.3 The Procedure of the Model-Free Incremental Q-Iearning Method 150
6.6 Convergence Proof 150
6.7 Simulation Study 153
6.7.1 Example 1 153
6.7.2 Example 2 155
6.8 Conclusion 157
References 157
7 Multiobjective Optimal Control for a Class of Unknown Nonlinear Systems Based on Finite-Approximation-Error ADP Algorithm 159
7.1 Introduction 159
7.2 General Formulation 160
7.3 Optimal Solution Based on Finite-Approximation-Error ADP 162
7.3.1 Data-Based Identifier of Unknown System Dynamics 162
7.3.2 Derivation of the ADP Algorithm with Finite Approximation Errors 166
7.3.3 Convergence Analysis of the Iterative ADP Algorithm 168
7.4 Implementation of the Iterative ADP Algorithm 173
7.4.1 Critic Network 174
7.4.2 The Action Network 174
7.4.3 The Procedure of the ADP Algorithm 175
7.5 Simulation Study 175
7.5.1 Example 1 176
7.5.2 Example 2 179
7.6 Conclusions 182
References 182
8 A New Approach for a Class of Continuous-Time Chaotic Systems Optimal Control by Online ADP Algorithm 185
8.1 Introduction 185
8.2 Problem Statement 185
8.3 Optimal Control Based on Online ADP Algorithm 187
8.3.1 Design Method of the Critic Network and the Action Network 188
8.3.2 Stability Anal
智能控制与优化前沿:面向复杂动态系统的自适应决策新范式 本书聚焦于现代控制理论、优化算法与计算智能的交叉领域,深入探讨了在不确定性和非线性环境下,如何设计出具备自主学习和全局优化能力的控制系统。 当前,工程实践正面临着前一类传统模型驱动控制方法难以有效应对的挑战:系统模型的获取极其困难或成本高昂;系统动态特性随时间发生不可预测的变化;以及控制目标本身是依赖于实时性能评估的复杂优化问题。本书旨在构建一个超越传统反馈线性化和精确模型补偿的理论框架,重点阐述基于强化学习(Reinforcement Learning, RL)的自适应优化方法在解决高维、强耦合、非线性控制问题中的核心原理、算法实现与工程应用。 全书内容主要围绕以下几个相互关联且具有前瞻性的主题展开: 第一部分:非线性系统分析与优化基础重构 本部分为后续高级算法奠定坚实的数学和理论基础,侧重于从传统控制理论视角审视强化学习的潜力,并为复杂系统的稳定性与收敛性提供理论保障。 1. 复杂非线性系统建模的局限性与新型描述: 详细分析了经典系统描述(如状态空间模型、传递函数)在处理高自由度机器人、复杂化学反应器或电网等场景时的固有瓶颈。重点探讨了如何利用高阶张量表达和动态流形理论对系统内在的非光滑、不完整可观测性进行抽象描述。 2. 性能指标的动态优化视角: 重新审视最优控制中的性能指标(如JSP,最优哈密顿-雅可比方程)。本书强调,在模型未知或时变的情况下,性能指标本身必须是可学习和可修正的。讨论了基于观测器的性能估计,以及如何将性能指标转化为一个期望的反馈增益空间,而非固定的控制律。 3. 稳定性分析的非传统方法: 鉴于自适应控制器的在线调整特性,传统的李雅普诺夫稳定性理论在直接应用时存在困难。本部分深入介绍了区间分析(Interval Analysis)和保留能量函数(Invariant Energy Function)在评估带学习组件的闭环系统稳定性中的应用,为算法的鲁棒性提供定性分析工具。 第二部分:基于价值与策略的自学习框架 本部分是全书的核心,详细阐述了如何利用强化学习的理念来构建一个能够自主发现最优控制策略的算法体系。 4. 策略梯度与价值函数的解析分解: 超越标准的Actor-Critic框架,本书剖析了策略梯度(如REINFORCE, PPO)背后的概率论基础。重点讨论了如何针对连续控制域,采用高斯过程(Gaussian Processes, GP)或核方法对价值函数进行非参数估计,以有效处理高斯噪声和稀疏采样问题。 5. 基于模型的自适应规划(Model-Based Adaptive Planning): 强调了构建“弱模型”或“局部模型”的重要性。不同于完全依赖数据的Model-Free方法,本节阐述了如何利用有限的系统交互数据,通过局部线性化(Local Linearization)或稀疏系统辨识(Sparse System Identification)快速获得局部模型,并将这些局部模型集成到规划步骤中,显著提升学习效率和对异常输入的抵抗力。 6. 探索(Exploration)的有效性度量与策略: 在复杂的控制环境中,有效的探索是收敛到全局最优的关键。本书提出了信息增益驱动的探索策略,不再依赖于简单的$epsilon$-贪婪或随机扰动。探讨了如何量化当前策略对系统状态空间覆盖的程度,并利用不确定性传播(Uncertainty Propagation)来指导决策,优先选择能最大化未来信息量的动作。 第三部分:面向工程实现的鲁棒性与计算效率 本部分将理论算法转化为可部署的工程解决方案,重点关注算法的实时性、对环境干扰的抵抗力以及实际数据处理能力。 7. 异构数据流下的在线学习与重校准: 讨论了在实际工业物联网(IIoT)环境中,控制信号、传感器读数和性能反馈可能存在不同采样率和延迟。提出了时间序列卷积网络(TS-CNN)作为前置处理器,用于对异构数据进行同步和特征提取,确保学习过程的同步性。 8. 约束处理与安全关键控制的集成: 实际系统(如无人机、化工过程)必须遵守硬性约束(如物理边界、安全阈值)。本书探讨了投影梯度下降(Projected Gradient Descent)在策略更新阶段的应用,以及如何利用障碍函数(Barrier Functions)与价值函数近似相结合,构建出在学习过程中始终满足状态和输入约束的控制律。 9. 学习算法的并行化与边缘计算部署: 针对现代控制系统的高频要求,本节关注如何优化算法的计算复杂度。介绍了基于GPU的张量运算加速,以及如何将策略评估和模型更新解耦,实现异步并行训练。此外,还讨论了如何对训练好的轻量化策略网络进行模型剪枝和量化,以适应资源受限的边缘控制器。 本书的特色在于,它不局限于任何单一的深度学习架构,而是将控制论的严谨性、优化理论的全局视野与现代计算智能的自适应能力进行深度融合,为设计下一代自主、高效、可靠的复杂动态系统控制方案,提供了一条清晰且可验证的理论与技术路径。