基于自监督学习的温室移动机器人位姿跟踪 Self-supervised pose estimation method for a mobile robot in greenhouse期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于自监督学习的温室移动机器人位姿跟踪

引用本文：	周云成,许童羽,邓寒冰,苗腾,吴琼.基于自监督学习的温室移动机器人位姿跟踪[J].农业工程学报,2021,37(9):263-274.

作者姓名：	周云成许童羽邓寒冰苗腾吴琼

作者单位：	沈阳农业大学信息与电气工程学院，沈阳 110866

基金项目：	辽宁省教育厅基础研究项目（LSNJC202004）；辽宁省自然科学基金（20180551102）；国家自然科学基金（31901399）

摘要：	为实现温室环境下机器人行进过程中的位置及姿态跟踪,该研究提出一种基于时序一致性约束的自监督位姿变换估计模型。模型用软遮罩,处理视频帧间静止造成的位姿预测值收缩现象,进一步用归一化遮罩,解决非刚体场景和目标遮挡问题。设计了一种星型扩张卷积,并基于该卷积,为模型构建自编码器。在采集自种植作物为番茄的日光温室视频数据上开展训练和测试试验。结果表明,与不采用遮罩处理的模型相比,采用软遮罩的模型,位置和姿态估计相对误差分别减少5.06个百分点和11.05个百分点,采用归一化遮罩的模型,这2项误差则分别减少4.15个百分点和3.86个百分点,2种遮罩均可显著提高模型精度;星型扩张卷积对降低模型误差是有效的,在网络参数不变的前提下,该卷积使姿态估计相对误差减少7.54个百分点;时序一致性约束使姿态估计均方根误差下降36.48%,每百帧累积姿态角误差降低54.75%,该约束可用于提高模型精度及稳定性;该研究的位置及姿态估计相对误差分别为8.29%和5.71%,与Monodepth2相比,减少了8.61%和6.83%。该研究可为温室移动机器人导航系统设计提供参考。
关键词：	机器人温室导航位姿跟踪自监督学习视觉里程计卷积神经网络深度学习
收稿时间：	2021/2/1 0:00:00
修稿时间：	2021/4/16 0:00:00
Self-supervised pose estimation method for a mobile robot in greenhouse

Zhou Yuncheng,Xu Tongyu,Deng Hanbing,Miao Teng,Wu Qiong.Self-supervised pose estimation method for a mobile robot in greenhouse[J].Transactions of the Chinese Society of Agricultural Engineering,2021,37(9):263-274.

Authors:	Zhou Yuncheng Xu Tongyu Deng Hanbing Miao Teng Wu Qiong

Institution:	College of Information and Electrical Engineering, Shenyang Agricultural University, Shenyang 110866, China

Abstract:	Simultaneous localization and mapping (SLAM) play a vital role in implementing autonomous navigation of mobile robots in an unknown environment. Especially, visual odometry (VO) is a core component for a localization module in the SLAM system. The pose and velocity of a robot can, therefore, be estimated using computational geometry. Furthermore, the learning-based VO has gained great success in joint estimation camera ego-motion and depth from videos. In this study, a novel self-supervised VO model was proposed to realize the autonomous operation of a mobile robot in a greenhouse. The consistency constraint of temporal depth was also introduced for the learning framework using the binocular baseline supervision. Stereo video sequences were selected to train the model. The pose network after training was then used for pose estimation. A pre-test found that the stillness between video frames caused the prediction value of the model to shrink. Therefore, a soft mask was used in photometric re-projection error to remove the static region from the apparent difference measurement, and the non-rigidity scene and occlusion were further solved with normalized mask planes. Meanwhile, a new type of star dilated convolution (SDC) was also designed, where the filter was used to extract image features from the center 3×3 solid kernel and eight directions of 1-D kernel. The computational cost of SDC was thus less than that of the regular convolution of the same receptive field. Moreover, SDC was superimposed on spatial dimensions using depth-wise convolution with different dilation rates, particularly without the necessary to modify the existing deep learning framework. A convolutional auto-encoder (CAE) with residual network architecture was constructed using the SDC and inverse residual module (IRM), further serving as the backbone network for the VO model. With the aid of a binocular camera, the video sequences were collected in the solar greenhouses with tomato as the crop. The stereo video dataset was constructed to carry out the training and testing experiments. The static samples of the video sequence were removed from the image apparent difference measurement with a soft mask. The results showed that the mean relative errors (MREs) of translation and rotation estimation in the model were cut down by 5.06% and 11.05%, respectively, while the mean square root errors (RMSE) were reduced by 24.78% and 30.65%, respectively. Once a normalized mask plane was utilized in the model to deal with non-rigidity scenes and occlusion, the MREs of translation and rotation estimation were reduced by 4.15% and 3.86%, respectively. It inferred that both masks significantly improved the accuracy of the model. Meanwhile, the SDC-based IRM (SDC-IRM) reduced the MRE of rotation by 7.54% under the unchanged network parameters. Since the SDC-IRM structure presented significant effectiveness in reducing model error, the increase of perceptive field was an effective way to improve the accuracy of the model. The MRE and RMSE of rotation estimation were reduced by 2.74% and 36.48%, respectively, whereas, the mean cumulative rotation error per hundred frames (MCRE) decreased by 54.75%, when the consistency constraint of temporal depth was used in the model, indicating high accuracy and stability of pose estimation. The MRE of rotation estimation was reduced by 7.30% when extending the expansion factor of IRMs. The data demonstrated that the increase of receptive fields in the SDC kernel contributed to the higher accuracy of rotation estimation. Nevertheless, there was no longer obvious improvement of the model, when the maximum dilation rate was more than 6. More importantly, the calculation speed was up to 56.5 frames per second in the final pose estimation network. The MREs of translation and rotation estimation were 8.29% and 5.71%, respectively. The pose estimation performed better, compared with the previous VO model under similar input settings. This finding can provide sound support for the design of the navigation system for mobile robots in a greenhouse.

Keywords:	robots greenhouse navigation pose estimation self-supervised learning visual odometry convolutional neural network deep learning
本文献已被 CNKI 等数据库收录！
	点击此处可从《农业工程学报》浏览原始摘要信息
	点击此处可从《农业工程学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏