论文部分内容阅读
探索与扩张是Q-学习算法中动作选取的关键问题,一味地扩张将使智能体很快地陷入局部最优,虽然探索可以跳出局部最优并加速学习,而过多的探索将影响算法的性能.通过把Q-学习中寻求最优策略表示为组合优化问题中最优解的搜索,将模拟退火算法的Metropolis准则用于Q-学习中探索和扩张之间的折衷处理,提出基于Metropolis准则的Q-学习算法SA-Q-learning.通过实验比较,它具有更快的收敛速度,而且避免了过多探索引起的算法性能下降.
Exploring and expanding are the key problems in the selection of actions in Q-learning algorithm. Expanding blindly will cause the agent to quickly fall into the local optimum. Although exploration can jump out of local optimization and accelerate learning, too much exploration will affect the algorithm’s Performance.Based on the search of the optimal strategy in Q-learning as the optimal solution in the combinatorial optimization problem, the Metropolis criterion of simulated annealing algorithm is used for trade-off between exploration and expansion in Q-learning, and a Metropolis criterion Q-learning algorithm SA-Q-learning.Experimental comparison, it has faster convergence speed, but also avoids the performance degradation caused by too much exploration.