首页 | 本学科首页   官方微博 | 高级检索  
     

基于语义分割的非结构化田间道路场景识别
引用本文:孟庆宽,杨晓霞,张漫,关海鸥. 基于语义分割的非结构化田间道路场景识别[J]. 农业工程学报, 2021, 37(22): 152-160
作者姓名:孟庆宽  杨晓霞  张漫  关海鸥
作者单位:天津职业技术师范大学自动化与电气工程学院,天津 300222;中国农业大学现代精细农业系统集成研究教育部重点实验室,北京 100083;黑龙江八一农垦大学电气与信息学院,大庆 163319
基金项目:国家自然科学基金项目(31571570、62001329);天津市自然科学基金项目(18JCQNJC04500、19JCQNJC01700);天津职业技术师范大学校级预研项目(KJ2009、KYQD1706)
摘    要:环境信息感知是智能农业装备系统自主导航作业的关键技术之一.农业田间道路复杂多变,快速准确地识别可通行区域,辨析障碍物类别,可为农业装备系统高效安全地进行路径规划和决策控制提供依据.该研究以非结构化农业田间道路场景为研究对象,根据环境对象动、静态属性进行类别划分,提出一种基于通道注意力结合多尺度特征融合的轻量化语义分割模...

关 键 词:机器视觉  语义分割  环境感知  非结构化道路  轻量卷积  注意力机制  特征融合
收稿时间:2021-06-01
修稿时间:2021-09-16

Recognition of unstructured field road scene based on semantic segmentation model
Meng Qingkuan,Yang Xiaoxi,Zhang Man,Guan Haiou. Recognition of unstructured field road scene based on semantic segmentation model[J]. Transactions of the Chinese Society of Agricultural Engineering, 2021, 37(22): 152-160
Authors:Meng Qingkuan  Yang Xiaoxi  Zhang Man  Guan Haiou
Affiliation:1. College of Automation and Electrical Eengineering, Tianjin University of Technology and Education, Tianjin Key Laboratory of Information Sensing and Intelligent Control, Tianjin 300222, China;;2. Key Laboratory of Modern Precision Agriculture System Integration Research, Ministry of Education, China Agricultural University, Beijing 10083, China; 3. College of Electrical and Information, Heilongjiang Bayi Agricultural University, Daqing 163319, China
Abstract:Abstract: Environmental information perception has been one of the most important technologies in agricultural automatic navigation tasks, such as plant fertilization, crop disease detection, automatic harvesting, and cultivation. Among them, the complex environment of a field road is characterized by the fuzzy road edge, uneven road surface, and irregular shape. It is necessary to accurately and rapidly identify the passable areas and obstacles when the agricultural machinery makes path planning and decision control. In this study, a lightweight semantic segmentation model was proposed to recognize the unstructured roads in fields using a channel attention mechanism combined with the multi-scale features fusion. Some environmental objects were also classified into 12 categories, including building, person, vehicles, sky, waters, plants, road, soil, pole, sign, coverings, and background, according to the static and dynamic properties. Furthermore, a mobile architecture named MobileNetV2 was adopted to obtain the image feature information, in order to reduce the model parameters for a higher reasoning speed. Specifically, an inverted residual structure with lightweight depth-wise convolutions was utilized to filter the features in the intermediate expansion layer. In addition, the last two stages of the backbone network were combined with the hybrid dilated convolution (HDC), aiming to increase the receptive fields and maintain the resolution of the feature map. The hybrid dilated convolution with the dilation rate of 1, 2, and 3 was used to effectively expand the receptive fields, thereby alleviating the gridding problem caused by the standard dilated convolution. A channel attention block (CAB) was also introduced to change the weight of each stage feature, in order to enhance the class consistency. The channel attention block was used to strengthen both the higher and lower level features of each stage for a better prediction. In addition, some errors of semantic segmentation were partially or completely attributed to the contextual relationship. A pyramid pooling module was empirically adopted to fuse three scale feature maps for the global contextual prior. There was the global context information in the first image level, where the feature vector was produced by a global average pooling. The pooled representation was then generated for different locations, where the rest pyramid levels separated the feature maps into different sub-regions. As such, the output of different levels in the pyramid module contained the feature maps with varied sizes, followed by up sampling and concatenation to form the final output. The results showed that the objects in the complex roads were effectively segmented with pixel accuracy (PA) and mean pixel accuracy (MPA) of 94.85% and 90.38%, respectively. Furthermore, the single category pixel accuracy of some objects was more than 90%, such as road, plants, building, waters, sky, and soil, indicating a higher accuracy, strong robustness, and excellent generalization. An evaluation was also made to verify the efficiency and superiority of the model, where the mean intersection over union (MIoU), segmentation speed, and parameter scale were adopted as the indexes. The FCN-8S, SegNet, DeeplabV3+ and BiseNet networks were also developed on the same training and test datasets. It was found that the MIoU of the model was 85.51%, indicating a higher accuracy than others. The parameter quantity of the model was 2.41×106, smaller than FCN-8S, SegNet, DeeplabV3+, and BiseNet. In terms of an image with a resolution of 512×512 pixels, the reasoning speed of the model reached 8.19 frames per second, indicating an excellent balance between speed and accuracy. Consequently, the lightweight semantic segmentation model was achieved to accurately and rapidly segment the multiple road scenes in the field environment. The finding can provide a strong technical reference for the safe and reliable operation of intelligent agricultural machinery on unstructured roads.
Keywords:machine vision   semantic segmentation   environmental perception   unstructured field roads   lightweight convolution   attention mechanism   feature fusion
本文献已被 万方数据 等数据库收录!
点击此处可从《农业工程学报》浏览原始摘要信息
点击此处可从《农业工程学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号