首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于自监督学习的番茄植株图像深度估计方法
引用本文:周云成,许童羽,邓寒冰,苗腾,吴琼.基于自监督学习的番茄植株图像深度估计方法[J].农业工程学报,2019,35(24):173-182.
作者姓名:周云成  许童羽  邓寒冰  苗腾  吴琼
作者单位:沈阳农业大学信息与电气工程学院,沈阳 110866,沈阳农业大学信息与电气工程学院,沈阳 110866,沈阳农业大学信息与电气工程学院,沈阳 110866,沈阳农业大学信息与电气工程学院,沈阳 110866,沈阳农业大学信息与电气工程学院,沈阳 110866
基金项目:辽宁省自然科学基金(20180551102);国家自然科学基金(31601218)
摘    要:深度估计是智能农机视觉系统实现三维场景重建和目标定位的关键。该文提出一种基于自监督学习的番茄植株图像深度估计网络模型,该模型直接应用双目图像作为输入来估计每个像素的深度。设计了3种面向通道分组卷积模块,并利用其构建卷积自编码器作为深度估计网络的主体结构。针对手工特征衡量2幅图像相似度不足的问题,引入卷积特征近似性损失作为损失函数的组成部分。结果表明:基于分组卷积模块的卷积自编码器能够有效提高深度估计网络的视差图精度;卷积特征近似性损失函数对提高番茄植株图像深度估计的精度具有显著作用,精度随着参与损失函数计算的卷积模块层数的增加而升高,但超过4层后,其对精度的进一步提升作用不再明显;当双目图像采样距离在9.0 m以内时,该文方法所估计的棋盘格角点距离均方根误差和平均绝对误差分别小于2.5和1.8 cm,在3.0 m以内时,则分别小于0.7和0.5 cm,模型计算速度为28.0帧/s,与已有研究相比,2种误差分别降低了33.1%和35.6%,计算速度提高了52.2%。该研究可为智能农机视觉系统设计提供参考。

关 键 词:图像处理  卷积神经网络  算法  自监督学习  深度估计  视差  深度学习  番茄
收稿时间:2018/11/1 0:00:00
修稿时间:2019/12/10 0:00:00

Method for estimating the image depth of tomato plant based on self-supervised learning
Zhou Yuncheng,Xu Tongyu,Deng Hanbing,Miao Teng and Wu Qiong.Method for estimating the image depth of tomato plant based on self-supervised learning[J].Transactions of the Chinese Society of Agricultural Engineering,2019,35(24):173-182.
Authors:Zhou Yuncheng  Xu Tongyu  Deng Hanbing  Miao Teng and Wu Qiong
Institution:College of Information and Electrical Engineering, Shenyang Agricultural University, Shenyang 110866, China,College of Information and Electrical Engineering, Shenyang Agricultural University, Shenyang 110866, China,College of Information and Electrical Engineering, Shenyang Agricultural University, Shenyang 110866, China,College of Information and Electrical Engineering, Shenyang Agricultural University, Shenyang 110866, China and College of Information and Electrical Engineering, Shenyang Agricultural University, Shenyang 110866, China
Abstract:Abstract: Depth estimation is critical to 3D reconstruction and object location in intelligent agricultural machinery vision system, and a common method in it is stereo matching. Traditional stereo matching method used low-quality image extracted manually. Because the color and texture in the image of field plant is nonuniform, the artificial features in the image are poorly distinguishable and mismatching could occur as a result. This would compromise the accuracy of the depth of the map. While the supervised learning-based convolution neural network (CNN) is able to estimate the depth of each pixel in plant image directly, it is expensive to annotate the depth data. In this paper, we present a depth estimation model based on the self-supervised learning to phenotype tomato canopy. The tasks of the depth estimation method were to reconstruct the image. The dense disparity maps were estimated indirectly using the rectified stereo pair of images as the network input, from which a bilinear interpolation was used to sample the input images to reconstruct the warping images. We developed three channel wise group convolutional (CWGC) modules, including the dimension invariable convolution module, the down-sampling convolution module and the up-sampling convolution module, and used them to construct the convolutional auto-encoder - a key infrastructure in the depth estimation method. Considering the shortage of manual features for comparing image similarity, we used the loss in image convolutional feature similarity as one objective of the network training. A CWGC-based CNN classification network (CWGCNet) was developed to extract the low-level features automatically. In addition to the loss in image convolutional feature similarity, we also considered the whole training loss, which include the image appearance matching loss, disparity smoothness loss and left-right disparity consistency loss. A stereo pair of images of tomato was sampled using a binocular camera in a greenhouse. After epipolar rectification, the pair of images was constructed for training and testing of the depth estimation model. Using the Microsoft Cognitive Toolkit (CNTK), the CWGCNet and the depth estimation network of the tomato images were calculated using Python. Both training and testing experiments were conducted in a computer with a Tesla K40c GPU (graphics processing unit). The results showed that the shallow convolutional layer of the CWGCNet successfully extracted the low-level multiformity image features to calculate the loss in image convolutional feature similarity. The convolutional auto-encoder developed in this paper was able to significantly improve the disparity map estimated by the depth estimation model. The loss function in image convolutional feature similarity had a remarkable effect on accuracy of the image depth. The accuracy of the disparity map estimated by the model increased with the number of convolution modules for calculating the loss in convolutional feature similarity. When sampled within 9.0 m, the root means square error (RMSE) and the mean absolute error (MAE) of the corner distance estimated by the model were less than 2.5 cm and 1.8 cm, respectively, while when sampled within 3.0m, the associated errors were less than 0.7cm and 0.5cm, respectively. The coefficient of determination (R2) of the proposed model was 0.8081, and the test speed was 28 fps (frames per second). Compared with the existing models, the proposed model reduced the RMSE and MAE by 33.1% and 35.6% respectively, while increased calculation speed by 52.2%.
Keywords:image processing  convolution neural network  algorithms  self-supervised learning  depth estimation  disparity  deep learning  tomato
本文献已被 CNKI 等数据库收录!
点击此处可从《农业工程学报》浏览原始摘要信息
点击此处可从《农业工程学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号