首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于深度强化学习的柑橘采摘机械臂路径规划方法
引用本文:熊春源,熊俊涛,杨振刚,胡文馨.基于深度强化学习的柑橘采摘机械臂路径规划方法[J].华南农业大学学报,2023,44(3):473-483.
作者姓名:熊春源  熊俊涛  杨振刚  胡文馨
作者单位:华南农业大学 数学与信息学院, 广东 广州 510642
基金项目:国家自然科学基金(32071912);广州市基础研究计划(202102080337)
摘    要:【目的】为解决非结构化环境下采用深度强化学习进行采摘机械臂路径规划时存在的效率低、采摘路径规划成功率不佳的问题,提出了一种非结构化环境下基于深度强化学习(Deep reinforcement learning, DRL)和人工势场的柑橘采摘机械臂的路径规划方法。【方法】首先,通过强化学习方法进行采摘路径规划问题求解,设计了结合人工势场的强化学习方法;其次,引入长短期记忆(Longshort term memory,LSTM)结构对2种DRL算法的Actor网络和Critic网络进行改进;最后,在3种不同的非结构化柑橘果树环境训练DRL算法对采摘机械臂进行路径规划。【结果】仿真对比试验表明:结合人工势场的强化学习方法有效提高了采摘机械臂路径规划的成功率;引入LSTM结构的方法可使深度确定性策略梯度(Deep deterministic policy gradient,DDPG)算法的收敛速度提升57.25%,路径规划成功率提升23.00%;使软行为评判(Soft actor critic,SAC)算法的收敛速度提升53.73%,路径规划成功率提升9.00%;与传统算法RRT-connec...

关 键 词:采摘机械臂  柑橘  路径规划  深度强化学习  非结构化环境  LSTM
收稿时间:2022/6/17 0:00:00

Path planning method for citrus picking manipulator based on deep reinforcement learning
XIONG Chunyuan,XIONG Juntao,YANG Zhengang,HU Wenxin.Path planning method for citrus picking manipulator based on deep reinforcement learning[J].Journal of South China Agricultural University,2023,44(3):473-483.
Authors:XIONG Chunyuan  XIONG Juntao  YANG Zhengang  HU Wenxin
Institution:College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China
Abstract:Objective In order to solve the problems of poor training efficiency and low success rate of picking path planning of manipulator using deep reinforcement learning (DRL), this study proposed a path planning method combined with DRL and artificial potential field for citrus picking manipulator in unstructured environments.Method Firstly, the picking path planning problem was solved by the DRL with artificial potential field method. Secondly, the longshort term memory (LSTM) structure was introduced to improve the Actor network and Critic network of two DRL algorithms. Finally, the DRL algorithms were trained in three different unstructured citrus growing environments to perform path planning for picking manipulator.Result The comparison of simulation experiments showed that the success rate of path planning was effectively improved by combining DRL with the artificial potential field method, the method with LSTM structure improved the convergence speed of the deep deterministic policy gradient (DDPG) algorithm by 57.25% and the success rate of path planning by 23.00%. Meanwhile, the method improved the convergence speed of the soft actor critic (SAC) algorithm by 53.73% and the path planning success rate by 9.00%. Compared with the traditional algorithm RRT-connect (Rapidly exploring random trees connect), the SAC algorithm with LSTM structure shortened the planned path length by 16.20% and improved the path planning success rate by 9.67%.Conclusion The proposed path planning method has certain advantages for path planning length and path planning success rate, which can provide references for solving path planning problems of picking robots in unstructured environments.
Keywords:Picking manipulator  Citrus  Path planning  Deep reinforcement learning  Unstructured environment  LSTM
点击此处可从《华南农业大学学报》浏览原始摘要信息
点击此处可从《华南农业大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号