首页 | 本学科首页   官方微博 | 高级检索  
     检索      

面向温室移动机器人的无监督视觉里程估计方法
引用本文:吴雄伟,周云成,刘峻渟,刘忠颖,王昌远.面向温室移动机器人的无监督视觉里程估计方法[J].农业工程学报,2023,39(10):163-174.
作者姓名:吴雄伟  周云成  刘峻渟  刘忠颖  王昌远
作者单位:沈阳农业大学信息与电气工程学院, 沈阳 110866
基金项目:辽宁省教育厅科学研究项目(LSNJC202004);国家重点研发计划政府间国际科技创新合作重点专项项目(2019YFE0197700)
摘    要:针对温室移动机器人自主作业过程中,对视觉里程信息的实际需求及视觉里程估计因缺少几何约束而易产生尺度不确定问题,提出一种基于无监督光流的视觉里程估计方法。根据双目视频局部图像的几何关系,构建了局部几何一致性约束及相应光流模型,优化调整了光流估计网络结构;在网络训练中,采用金字塔层间知识自蒸馏损失,解决层级光流场缺少监督信号的问题;以轮式移动机器人为试验平台,在种植番茄温室场景中开展相关试验。结果表明,与不采用局部几何一致性约束相比,采用该约束后,模型的帧间及双目图像间光流端点误差分别降低8.89%和8.96%;与不采用层间知识自蒸馏相比,采用该处理后,两误差则分别降低11.76%和11.45%;与基于现有光流模型的视觉里程估计相比,该方法在位姿跟踪中的相对位移误差降低了9.80%;与多网络联合训练的位姿估计方法相比,该误差降低了43.21%;该方法可获得场景稠密深度,深度估计相对误差为5.28%,在1 m范围内的位移平均绝对误差为3.6 cm,姿态平均绝对误差为1.3o,与现有基准方法相比,该方法提高了视觉里程估计精度。研究结果可为温室移动机器人视觉系统设计提供技术参考。

关 键 词:机器人  温室  导航  视觉里程计  无监督学习  光流  卷积神经网络
收稿时间:2023/2/12 0:00:00
修稿时间:2023/4/26 0:00:00

Unsupervised visual odometry method for greenhouse mobile robots
WU Xiongwei,ZHOU Yuncheng,LIU Junting,LIU Zhongying,WANG Changyuan.Unsupervised visual odometry method for greenhouse mobile robots[J].Transactions of the Chinese Society of Agricultural Engineering,2023,39(10):163-174.
Authors:WU Xiongwei  ZHOU Yuncheng  LIU Junting  LIU Zhongying  WANG Changyuan
Institution:College of Information and Electrical Engineering, Shenyang Agricultural University, Shenyang 110866, China
Abstract:Simultaneous Localization and Mapping (SLAM) is one of the most crucial aspects of autonomous navigation in mobile robots. The core components of the SLAM system can be depth perception and pose tracking. However, the existing unsupervised learning visual odometry framework cannot fully meet the actual requirements of the visual odometry information, particularly on the scale uncertainty in visual odometry estimation. It is still lacking in the geometric constraints during the autonomous operation of greenhouse mobile robots. In this study, an unsupervised optical flow-based visual odometry was presented. An optical flow estimation network was trained in an unsupervised manner using image warping. The optical flow between stereo images (disparity) was used to calculate the absolute depth of scenes. The optical flow between adjacent frames of left images was combined with the scene depth, in order to solve the frame-to-frame pose transformation matrix using Perspective-n-Point (PnP) algorithm. The reliable correspondences were selected in the solving process using forward and backward flow consistency checking to recover the absolute pose. A compact deep neural network was built with the convolutional modules to serve as the backbone of the flow model. This improved network was designed, according to the well-established principles: pyramidal processing, warping, and the use of a cost volume. At the same time, the cost volume normalization in the network was estimated with high values to alleviate the feature activations at higher levels than before. Furthermore, the local geometric consistency constraints were designed for the objective function of flow models. Meanwhile, a pyramid distilling loss was introduced to provide the supervision for the intermediate optical flows via distilling the finest final flow field as pseudo labels. A series of experiments were conducted using a wheeled mobile robot in a tomato greenhouse. The results showed that the better performance was achieved in the improved model. The local geometric consistency constraints improved the optical flow estimation accuracy. The endpoint error (EPE) of inter-frame and stereo optical flow was reduced by 8.89% and 8.96%, respectively. The pyramid distillation loss significantly reduced the optical flow estimation error of the flow model, in which the EPEs of the inter-frame and stereo optical flow decreased by 11.76% and 11.45%, respectively. The EPEs of the inter-frame and stereo optical flow were reduced by 12.50% and 7.25%, respectively, after cost volume normalization. Particularly, the price decreased by 1.28% for the calculation speed of the optical flow network. This improved model showed a 9.52% and 9.80% decrease in the root mean square error (RMSE) and mean absolute error (MAE) of relative translation error (RTE), respectively, compared with an existing unsupervised flow model. The decrease was 43.0% and 43.21%, respectively, compared with the Monodepth2. The pose tracking accuracy of this improved model was lower than that of ORB-SLAM3. The pure multi-view geometry shared the predicting dense depth maps of a scene. The relative error of depth estimation was 5.28% higher accuracy than the existing state-of-the-art self-supervised joint depth-pose learning. The accuracy of pose tracking depended mainly on the motion speed of robots. The performance of pose tracking at 0.2 m/s low speed and 0.8 m/s fast speed was significantly lower than that at 0.4-0.6 m/s. The resolution of the input image greatly impacted the pose tracking accuracy, with the errors decreasing gradually as the resolution increased. The MAE of RTE was not higher than 3.6 cm with the input image resolution of 832×512 pixels and the motion scope of 1 m, whereas, the MAE of relative rotation error (RRE) was not higher than 1.3o. These findings can provide technical support to design the vision system of greenhouse mobile robots.
Keywords:robot  greenhouse  navigation  visual odometry  unsupervised learning  optical flow  convolutional neural network
点击此处可从《农业工程学报》浏览原始摘要信息
点击此处可从《农业工程学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号