基于策略—估值—好奇心框架强化学习的机器人轨迹规划

ROBOTIC TRAJECTORY PLANNING WITH REINFORCEMENT LEARING METHOD BASED ON ACTION-CRITIC-CURIOSITY FRAMEWORK

摘要: 现有基于深度强化学习（DRL）的机器人轨迹规划方法通常效率低下，且容易陷入局部最优解。为解决上述问题，设计了一种好奇心网络，并基于此提出了策略—估值—好奇心框架（A-C-C），A-C-C使智能体以更接近人类的方式处理问题，更关注探索的过程而不是结果。通过加强对未知区域的探索，A-C-C框架能够有效地提高DRL方法的学习效率并避免局部最优解。实验结果表明，A-C-C框架可以与不同的奖励函数结合，使得探索效率加快43.6%~101.2%，同时可以使得收敛均值提高4.8%~6.4%。

Abstract: In robot trajectory planning, deep reinforcement learning (DRL) based methods often suffer from the low learning efficiency and the problem of locally optimal solution. To cope with the defects above, a curiosity network and a modified optimization framework action-critic-curiosity (A-C-C) are proposed. A-C-C enabled the agent considering the problems more human-like, and made it pay more attentions to the process of exploration than the result. By promoting the exploration of unknown regions, A-C-C effectively improved the learning efficiency of DLR method and avoided local optimal solutions.The experiment results show that the proposed method can be combined with different reward functions to accelerate exploration efficiency by 43.6%-101.2%. The mean convergence is also improved by 4.8%-6.4%.