首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于SwinT-YOLACT的玉米果穗实时实例分割
引用本文:朱德利,余茂生,梁明飞.基于SwinT-YOLACT的玉米果穗实时实例分割[J].农业工程学报,2023,39(14):164-172.
作者姓名:朱德利  余茂生  梁明飞
作者单位:1. 重庆师范大学计算机与信息科学学院,重庆 401331;2. 重庆市数字农业服务工程技术研究中心,重庆 401331
基金项目:重庆市教育委员会科学技术研究项目(KJQN201800536);重庆市高校创新研究群体项目智慧农业的机器视觉感知与智能算法研究(CXQT20015)
摘    要:玉米果穗的表型参数是玉米生长状态的重要表征,生长状况的好坏直接影响玉米产量和质量。为方便无人巡检机器人视觉系统高通量、自动化获取玉米表型参数,该研究基于YOLACT(you only look at coefficients)提出一种高精度-速度平衡的玉米果穗分割模型SwinT-YOLACT。首先使用Swin-Transformer作为模型主干特征提取网络,以提高模型的特征提取能力;然后在特征金字塔网络之前引入有效通道注意力机制,剔除冗余特征信息,以加强对关键特征的融合;最后使用平滑性更好的Mish激活函数替换模型原始激活函数Relu,使模型在保持原有速度的同时进一步提升精度。基于自建玉米果穗数据集训练和测试该模型,试验结果表明,SwinT-YOLACT的掩膜均值平均精度为79.43%,推理速度为35.44帧/s,相较于原始YOLACT和其改进算法YOLACT++,掩膜均值平均精度分别提升了3.51和3.38个百分点;相较于YOLACT、YOLACT++和Mask R-CNN模型,推理速度分别提升了3.39、2.58和28.64帧/s。该模型对玉米果穗有较为优秀的分割效果,适于部署在无人巡检机器人视觉系统上,为玉米生长状态监测提供技术支撑。

关 键 词:图像分割  注意力机制  玉米果穗  YOLACT  Swin-Transformer
收稿时间:2023/2/28 0:00:00
修稿时间:2023/4/17 0:00:00

Real-time instance segmentation of maize ears using SwinT-YOLACT
ZHU Deli,YU Maosheng,LIANG Mingfei.Real-time instance segmentation of maize ears using SwinT-YOLACT[J].Transactions of the Chinese Society of Agricultural Engineering,2023,39(14):164-172.
Authors:ZHU Deli  YU Maosheng  LIANG Mingfei
Institution:1. College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China;2. Research Center of Chongqing Digital Agricultural Service Engineering Technology, Chongqing 401331, China
Abstract:Maize is one of the most important food crops in the field of agricultural development and food security in China. Among them, maize fruit and ear can directly determine the yield and quality of maize. Their phenotypic parameters (such as size and shape) can also be crucial indicators for the growth state of the plant. Fortunately, machine vision can be expected to serve as the maize phenotypic parameter acquisition and trait analysis, due to its objectivity, accuracy and speed, particularly with the application of artificial intelligence technology in agricultural production. Field inspection robots can be utilized to monitor the maize growth status in large-scale planting mode during this stage. This study aims to realize the high-throughput and automated acquisition of maize phenotypic parameters by unmanned inspection robots. A high-precision-speed balanced maize ear segmentation model, SwinT-YOLACT was proposed using YOLACT (you only look at coefficients) algorithm. Three optimization strategies were designed, according to the characteristics of the maize ear segmentation task. Firstly, Swin-Transformer was used as the backbone feature extraction network of the improved model, where the self-attention mechanism of Transformer structure was integrated to enhance the global feature extraction capability; Secondly, a 3-layer effective channel attention mechanism was introduced before the feature pyramid network to eliminate the redundant feature information, in order to enhance the fusion of key features for the high accuracy of the improved model; Finally, the Mish activation function with better smoothing was used to replace the original Relu activation function, in order to further improve the segmentation accuracy at the original inference speed. In addition, the maize plant data was collected with the different environmental backgrounds and various maturity stages of maize fruit and maize ears in the field. Labelme annotation software was then adopted to manually label the data, according to the COCO dataset format. The number of samples was also expanded using data augmentation. A segmentation dataset of the maize fruit and ear was constructed for the model training on a deep learning network. The self-built segmentation dataset of maize ear was used to train and test the improved model. The experimental results show that the mask mean average precision was improved by 2.11 percentage points after introducing Swin-Transformer as the backbone feature extraction network, compared with the original YOLACT model. There was no influence on the segmentation speed. By contrast, the mask mean average precision was improved by 0.65 percentage points after introducing efficient channel attention before the feature pyramid network. The inference speed of the model was basically unchanged with Swin-Transformer as the backbone feature extraction network. The original model Relu activation function was replaced by Mish activation function, according to the first two experiments. The mask mean average precision was improved by 0.75 percentage points than before the replacement, whereas, the model inference speed was improved by 2.74 frames per second. SwinT-YOLACT was also used to compare with the YOLACT, YOLACT++, YOLACT-Edge, and Mask R-CNN segmentation models, all of which used the same experimental environment and training strategy. The verification results show that the mask mean average precision of SwinT-YOLACT reached 79.43%, which was 3.51, 3.38, and 7.88 percentage points higher than those of the original YOLACT, the improved model YOLACT++, and YOLACT-Edge, respectively, while only slightly lower than that of the Mask R-CNN model. The better performance of the improved models was then achieved in the segmentation task. In terms of segmentation speed, the inference speed of SwinT-YOLACT was 35.44 frames per second, which was much better than that of Mask R-CNN model at 6.80 frames per second, and also improved by 3.39 and 2.58 frames per second, compared with the YOLACT and YOLACT++, respectively. In summary, SwinT-YOLACT can be expected for better segmentation of the maize fruit and ear in the unmanned inspection robot vision system. The finding can provide technical support for maize growth status monitoring.
Keywords:image segmentation  attention mechanism  maize ear  YOLACT  Swin-Transformer
点击此处可从《农业工程学报》浏览原始摘要信息
点击此处可从《农业工程学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号