机械臂深度强化学习降维快速训练方法

王敏; 王赞; 李珅; 陈立家; 范贤博俊; 王晨露; 刘名果

doi:10.3969/j.issn.1000-386x.2025.04.040

机械臂深度强化学习降维快速训练方法

FAST TRAINING METHOD OF DEEP REINFORCEMENT LEARNING DIMENSIONALITY REDUCTION FOR MECHANICAL ARM

摘要

摘要: 针对深度强化学习算法在三维环境下对机械臂进行全自由度训练时训练周期过长等问题，提出一种面向解空间的机械臂深度强化学习快速训练方法。首先，通过对抓取任务分解，将机械臂横向机与纵向机间的训练解耦，通过降维的方式压缩解空间，在保证动作执行精度的情况下，简化了训练过程；其次对深度确定性策略梯度（Deep Deterministic Policy Gradient, DDPG）算法进行改进，对同批次样本进行二次价值估计以延迟更新策略网络，辅以优先经验回放，有效提升了DDPG算法的训练效率。实验结果表明所提方法具备训练复杂度低、速度快和成本低的特点，抓取成功率可以达到98%，有利于工业场合的应用推广。

Abstract: Aimed at the problem that the training cycle of the deep reinforcement learning algorithm is too long when it performs full degree of freedom training for manipulator in 3D environment, a fast training method of deep reinforcement learning for manipulator is proposed. By decomposing the grasping task, the training of the lateral steering gear and the longitudinal steering gear of the manipulator was decoupled, and the solution space was compressed by dimensionality reduction, which simplified the training process while ensuring the execution accuracy of the action. The deep deterministic policy gradient (DDPG) algorithm was improved, and the secondary value estimation was performed on the same batch of samples to delay the updating of the strategy network, supplemented by preferential experience replay, which effectively improves the training efficiency of DDPG algorithm. Experimental results show that the proposed method has the characteristics of low training complexity, high speed and low cost, and the success rate of grasping can reach 98%, which is beneficial to the application and promotion of industrial occasions.

HTML全文

参考文献(0)

施引文献

资源附件(0)