论文部分内容阅读
现有强化学习方法的收敛性分析大多针对离散状态问题,对于连续状态问题强化学习的收敛性分析仅局限于简单的 LQR 控制问题.本文对现有两种用于 LQR 问题收敛的强化学习方法进行分析,针对存在的问题,提出一种只需部分模型信息的强化学习方法.该方法使用递推最小二乘 TD(RLS-TD)方法估计值函数参数,递推最小二乘方法(RLS)估计贪心改进策略.并给出理想情况下此方法收敛的理论分析.仿真实验表明该方法收敛到最优控制策略.
The convergence analysis of the existing reinforcement learning methods mostly focuses on discrete state problems, and the convergence analysis of reinforcement learning for continuous state problems is limited to a simple LQR control problem.In this paper, two existing reinforcement learning methods for LQR convergence To solve the existing problems, this paper proposes a reinforcement learning method which only needs some model information.This method uses the Recursive Least Squares TD (RLS-TD) method to estimate the value function parameters and recursive least square method (RLS) Greedy to improve the strategy.And the theoretic analysis of convergence under ideal conditions is given.The simulation results show that this method converges to the optimal control strategy.