Abstract:
Aimed at the problem that the training cycle of the deep reinforcement learning algorithm is too long when it performs full degree of freedom training for manipulator in 3D environment, a fast training method of deep reinforcement learning for manipulator is proposed. By decomposing the grasping task, the training of the lateral steering gear and the longitudinal steering gear of the manipulator was decoupled, and the solution space was compressed by dimensionality reduction, which simplified the training process while ensuring the execution accuracy of the action. The deep deterministic policy gradient (DDPG) algorithm was improved, and the secondary value estimation was performed on the same batch of samples to delay the updating of the strategy network, supplemented by preferential experience replay, which effectively improves the training efficiency of DDPG algorithm. Experimental results show that the proposed method has the characteristics of low training complexity, high speed and low cost, and the success rate of grasping can reach 98%, which is beneficial to the application and promotion of industrial occasions.