首页 | 本学科首页   官方微博 | 高级检索  
     

融入全局相应归一化注意力机制的YOLOv5农作物害虫识别模型
引用本文:郭嘉璇,王蓉芳,南江华,李小虎,焦昶哲. 融入全局相应归一化注意力机制的YOLOv5农作物害虫识别模型[J]. 农业工程学报, 2024, 40(8): 159-170
作者姓名:郭嘉璇  王蓉芳  南江华  李小虎  焦昶哲
作者单位:西安电子科技大学人工智能学院,西安 710071;陕西省渭南市蒲城县农业农村局,渭南 715500;陕西省渭南市蒲城县植保植检站,渭南 715500
基金项目:国家自然科学基金项目(62176196);陕西省重点研发计划一般项目-农业领域(2023-YBNY-218,2023-YBNY-284)
摘    要:针对YOLOv5(you only look once version five)模型在农作物害虫密集目标上的检测效果无法满足实际需求,以及训练过程中模型收敛速度较慢等问题,该研究提出了融入全局响应归一化(global response normalization,GRN)注意力机制的YOLOv5农作物害虫识别模型(YOLOv5-GRNS)。设计了融入GRN注意力机制的编码器(convolution three,C3)模块,提高对密集目标的识别精度;利用形状交并比(shape intersection over union,SIoU)损失函数提高模型收敛速度和识别精度;在公开数据集IP102(insect pests 102)的基础上,筛选出危害陕西省主要农作物的8种害虫类型,构建了新数据集IP8-CW(insect pests eight for corn and wheat)。改进后的模型在新IP8-CW和完整的IP102两种数据集上进行了全面验证。对于IP8-CW,全类别平均准确率(mean average precision,mAP)mAP@.5和mAP@.5:.95分别达到了72.3%和47.0%。该研究还对YOLOv5-GRNS模型进行了类激活图分析,不仅从识别精度,而且从可解释性的角度,验证了对农作物害虫、尤其是密集目标的优秀识别效果。此外,模型还兼具参数量少、运算量低的优势,具有良好的嵌入式设备应用前景。

关 键 词:图像识别  害虫检测  YOLOv5  GRN注意力  密集小目标
收稿时间:2023-11-15
修稿时间:2024-03-25

YOLOv5 model integrated with GRN attention mechanism for insect pest recognition
GUO Jiaxuan,WANG Rongfang,NAN Jianghu,LI Xiaohu,JIAO Changzhe. YOLOv5 model integrated with GRN attention mechanism for insect pest recognition[J]. Transactions of the Chinese Society of Agricultural Engineering, 2024, 40(8): 159-170
Authors:GUO Jiaxuan  WANG Rongfang  NAN Jianghu  LI Xiaohu  JIAO Changzhe
Affiliation:School of Artificial Intelligence, Xidian University, Xian 710071, China;Agricultural and Rural Affairs Bureau of Pucheng, Shaanxi Province, Weinan 715500, China;Plant Protection and Phytosanitary Station of Pucheng, Shaanxi Province, Weinan 715500, China
Abstract:An automatic, rapid, and accurate detection is required to monitor the pest in the large-scale areas in the field. In this study, the YOLOv5 (you only look once version five) model was used to detect crop pests. The existing YOLOv5 was incorporated the the global response normalization attention mechanism (YOLOv5-GRNS). An accurate detection was realized for the targets on images with complex backgrounds and excessive pest density. The improved model also converged rapidly during training. Firstly, the Global Response Normalization (GRN) operation was introduced into the encoder module, named Convolution Three (C3) which incorporated the GRN attention mechanism. The C3 module was used to exchange the channel information for less background interference at the channel level, thereby improving the detection accuracy of dense targets. Secondly, the Shape Intersection over Union (SIoU) loss function was utilized to improve the convergence speed and detection accuracy of the improved model. Besides, 8 types of pests that harm major crops in Shaanxi Province were screened out, according to the public dataset IP102 (insect pests 102). Then the dataset was revised and expanded to obtain a new dataset, named IP8-CW (insect pests eight for corn and wheat). Extensive experiments of the YOLOv5-GRNS model were conducted on both the new IP8-CW dataset and the existing IP102 dataset. The mean average precision (mAP) was achieved at 72.3% with mAP@0.5 and 47.0% with mAP@0.5:0.95 in the IP8-CW dataset. The YOLOv5-GRNS model increased by 1.3% and 1.6%, respectively, compared with the standard YOLOv5. The best performance was also achieved in the larger IP102 dataset with a 96-class classification task, indicating the lower complexity and fewer parameters. Ablation experiments were then conducted on the IP8-CW dataset to explore the influence of different factors on YOLOv5-GRNS performance. The results showed that a more regular path was achieved in the prediction box fitting the ground truth box using the improved model with the SIoU loss function. Thus the convergence rate of 30 epochs was promoted, compared with the rest two loss functions. The performance of the improved model was significantly improved in the sandwich structure using the GRN operation as the normalization and the channel attention layer. Furthermore, the performance was also higher than that of the structure, where the GRN was only one of them. The ablation experiment showed that there was only a little improvement in the YOLOv5-standard models with the rest attention mechanisms. The improved model with GRN operation was achieved in the best detection performance with the lowest model complexity and minimum parameters. Class Activation Maps (CAM) used the heat maps to mark the key locations that the model focused on in red. This feature of CAM was used to verify the effectiveness of the improved attention mechanism on the YOLOv5-GRNS model. Three datasets showed that the YOLOv5-GRNS model was concentratedly and accurately focused on the target area, rather than the complex background or dense targets. In summary, the YOLOv5-GRNS can be expected to serve as a robust response in the field of pest detection. Moreover, the excellent performance of the YOLOv5-GRNS model was also verified in the detection of small and dense targets on different datasets, indicating better generalization and interpretability with fewer parameters and less computational complexity. The improved model can also be applied to the embedded devices and mobile devices.
Keywords:image recognition   pest detection  YOLOv5  GRN attention  dense small target
点击此处可从《农业工程学报》浏览原始摘要信息
点击此处可从《农业工程学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号