基于通道特征金字塔的田间葡萄实时语义分割方法 Real-time semantic segmentation method for field grapes based on channel feature pyramid期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于通道特征金字塔的田间葡萄实时语义分割方法

引用本文：	孙俊,宫东见,姚坤杉,芦兵,戴春霞,武小红.基于通道特征金字塔的田间葡萄实时语义分割方法[J].农业工程学报,2022,38(17):150-157.

作者姓名：	孙俊宫东见姚坤杉芦兵戴春霞武小红

作者单位：	江苏大学电气信息工程学院，镇江 212013

基金项目：	江苏大学农业装备学部项目（NZXB20210210）；江苏高校优势学科建设工程（三期）资助项目（PAPD-2018-87）

摘要：	复杂环境下葡萄的快速检测识别是智能采摘的关键步骤，为解决目前葡萄识别精度低和实时性差的问题，该研究提出一种轻量级葡萄实时语义分割模型（Grape Real-time Semantic Segmentation Model，GRSM）。首先，利用通道特征金字塔（Channel-wise Feature Pyramid，CFP）模块进行特征提取，该模块通过1?3和3?1空洞卷积的跳跃连接，在减少模型参数量的同时提取葡萄图像的多尺度特征和上下文信息；然后，采用池化卷积融合结构完成下采样，增加可训练参数以减少信息损失；最后，利用跳跃连接融合多种特征恢复图像细节。试验结果表明：该研究所提出的模型在田间葡萄测试集上达到了78.8%的平均交并比，平均像素准确率为90.3%，处理速度达到68.56帧/s，网络结构大小仅为4.88MB。该模型具有较高分割识别精度和较好实时性，能满足葡萄采摘机器人对视觉识别系统的要求，为葡萄的智能化采摘提供了理论基础。
关键词：	机器视觉图像识别语义分割实时性葡萄 CFP
收稿时间：	2022/3/10 0:00:00
修稿时间：	2022/7/22 0:00:00
Real-time semantic segmentation method for field grapes based on channel feature pyramid

Sun Jun,Gong Dongjian,Yao Kunshan,Lu Bing,Dai Chunxi,Wu Xiaohong.Real-time semantic segmentation method for field grapes based on channel feature pyramid[J].Transactions of the Chinese Society of Agricultural Engineering,2022,38(17):150-157.

Authors:	Sun Jun Gong Dongjian Yao Kunshan Lu Bing Dai Chunxi Wu Xiaohong

Institution:	School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China

Abstract:	Automated and intelligent harvesting has been one of the most important steps for urgent task in the grape industry. However, the current models of fruit recognition have posed a great balance between accuracy and real-time performance. In this study, a lightweight and real-time semantic segmentation model was proposed for field grape harvesting using a channel feature pyramid. Firstly, a publicly available dataset of field grape instance segmentation was used as the experimental object. A total of 300 grape images were collected with the different pruning periods, lighting conditions, and maturity levels. The LabelMe annotation tool was used to build the field grape dataset. Four types of objects were annotated, including the background, leaves, grapes, and stems. The dataset was then expanded using random enhancement, resulting in a total of 1200 images. Since the original images were too large in pixels to be trained directly, the image resolution was uniformly compressed to 512×512 (pixels) for better training efficiency of the network model. Secondly, the convolutional kernels of different sizes were arranged in the perceptual fields, due to the huge differences in the grape size and location. The channel feature pyramid module was then utilized for the feature extraction. The 3×3, 5×5, and 7×7 multi-scale feature extraction datasets were then achieved for the jumping connections of 1x3 and 3x1 null convolutions in a single channel. As such, the multi-scale and contextual features were effectively extracted from the grape images. At the same time, the model parameters were reduced to increase the trainable ones for less information loss. The convolutional fusion structure was pooled during down-sampling, instead of the traditional maximum pooling structure. The jump joints were employed in the decoding part, in order to fuse information from different feature layers for the recovery of image details. Finally, the improved model was tested on a grape test set. The segmentation targets of the dataset included the grapes, stems, leaves, and background. The experimental results showed that the improved model was achieved in an average crossover and Intersection over Union (IoU) of 78.8%, 84.5%, 79.0%, and 72.4% for the grapes, stems, and leaves, respectively. The average pixel accuracy was 90.3%, and the real-time processing speed was 68.56 FPS/s. The model size was kept within 5M. The accuracies of Mean IoU were improved by 7.9, 5.7, and 10.5 percentage points in the real-time semantic segmentation networks, respectively, compared with the BiSeNet, ENet, and DFAnet. The accuracies of the improved model increased by 1.2 and 8.8 percentage points, respectively, compared with lightweight networks using mobilienetv3 and inception as encoders. Therefore, the proposed network presented a significant advantage over the real-time and lightweight networks, in terms of segmentation accuracy. The mean IoUs of the semantic segmentation network was reduced by 2.3, 2.0, and 3.7 percentage points, respectively, but the model sizes were 12.3%, 4.1%, and 7.4%, respectively, compared with the classical networks, Deeplabv3+, SegNet, and UNet. The real-time requirement fully met the tradeoff between real-time and accuracy. The improved model can be expected to serve as the segmentation recognition of field grapes in smart agriculture. The finding can also provide technical support for the visual recognition systems in the grape-picking robots.

Keywords:	machine vision image recognition semantic segmentation real-time grape cfp

	点击此处可从《农业工程学报》浏览原始摘要信息
	点击此处可从《农业工程学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏