融合卷积神经网络与视觉注意机制的苹果幼果高效检测方法 Efficient detection method for young apples based on the fusion of convolutional neural network and visual attention mechanism期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

融合卷积神经网络与视觉注意机制的苹果幼果高效检测方法

引用本文：	宋怀波, 江梅, 王云飞, 宋磊. 融合卷积神经网络与视觉注意机制的苹果幼果高效检测方法[J]. 农业工程学报, 2021, 37(9): 297-303. DOI: 10.11975/j.issn.1002-6819.2021.09.034

作者姓名：	宋怀波江梅王云飞宋磊

作者单位：	1.西北农林科技大学机械与电子工程学院，杨凌 712100;2.农业农村部农业物联网重点实验室，杨凌 712100;3.陕西省农业信息感知与智能服务重点实验室，杨凌 712100

基金项目：	国家重点研发计划（2019YFD1002401）；国家自然科学基金项目（31701326）；国家高技术研究发展计划（863计划）项目（2013AA10230402）

摘要：	果实表型数据高通量、自动获取是果树新品种育种研究的基础，实现幼果精准检测是获取生长数据的关键。幼果期果实微小且与叶片颜色相近，检测难度大。为了实现自然环境下苹果幼果的高效检测，采用融合挤压激发块（Squeeze-and-Excitation block, SE block）和非局部块（Non-Local block, NL block）两种视觉注意机制，提出了一种改进的YOLOv4网络模型（YOLOv4-SENL）。YOLOv4模型的骨干网络提取高级视觉特征后，利用SE block在通道维度整合高级特征，实现通道信息的加强。在模型改进路径聚合网络（Path Aggregation Network, PAN）的3个路径中加入NL block，结合非局部信息与局部信息增强特征。SE block和NL block两种视觉注意机制从通道和非局部两个方面重新整合高级特征，强调特征中的通道信息和长程依赖，提高网络对背景与果实的特征捕捉能力。最后由不同尺寸的特征图实现不同大小幼果的坐标和类别计算。经过1 920幅训练集图像训练，网络在600幅测试集上的平均精度为96.9%，分别比SSD、Faster R-CNN和YOLOv4模型的平均精度提高了6.9百分点、1.5百分点和0.2百分点，表明该算法可准确地实现幼果期苹果目标检测。模型在480幅验证集的消融试验结果表明，仅保留YOLOv4-SENL中的SE block比YOLOv4模型精度提高了3.8百分点；仅保留YOLOv4-SENL中3个NL block视觉注意模块比YOLOv4模型的精度提高了2.7百分点；将YOLOv4-SENL中SE block与NL blocks相换，比YOLOv4模型的精度提高了4.1百分点，表明两种视觉注意机制可在增加少量参数的基础上显著提升网络对苹果幼果的感知能力。该研究结果可为果树育种研究获取果实信息提供参考。
关键词：	机器视觉图像处理苹果幼果果实检测 YOLOv4 卷积神经网络视觉注意机制
收稿时间：	2021-01-15
修稿时间：	2021-04-28
Efficient detection method for young apples based on the fusion of convolutional neural network and visual attention mechanism

Song Huaibo, Jiang Mei, Wang Yunfei, Song Lei. Efficient detection method for young apples based on the fusion of convolutional neural network and visual attention mechanism[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2021, 37(9): 297-303. DOI: 10.11975/j.issn.1002-6819.2021.09.034

Authors:	Song Huaibo Jiang Mei Wang Yunfei Song Lei

Affiliation:	1.College of Mechanical and Electronic Engineering, Northwest A&F University, Yangling 712100, China;2.Key Laboratory of Agricultural Internet of Things, Ministry of Agriculture and Rural Affairs, Yangling 712100, China;3.Shaanxi Key Laboratory of Agricultural Information Perception and Intelligent Services, Yangling 712100, China

Abstract:	Accurate detection of young fruits is critical to obtain growth data, particularly in the high-throughput and automatic acquisition of phenotypic information serving as the basis of fruit tree breeding. Since the fruits at young stage are in a small shape similar to the leaf color, it has made it difficult to be detected in deep learning. In this study, an improved YOLOv4 network model (YOLOv4-SENL) was proposed to achieve highly efficient detection of young apples in a natural environment. Squeeze-and-excitation (SE) and non-local (NL) blocks were also combined to detect young apples. The backbone network of feature extraction in YOLOv4 was utilized to extract high-level features, whereas, the SE block was used to reorganize and consolidate high-level features in the channel dimension to achieve the enhancement of the channel information. The NL block was added to three paths of improved path aggregation network (PAN), combining non-local and local information obtained by convolution operations to enhance features. Two visual attention mechanisms (SE and NL block) were used to re-integrate high-level features from both channel and non-local aspects, with emphasis on the channel information and long-range dependencies in features. As such, the improved ability was achieved to capture the characteristics of background and fruit. Finally, the coordinates and classification were performed on the feature maps with different sizes of young apples. The pre-training weights of the backbone network on MS COCO dataset were loaded in the process of network training, where random gradient descent was used to update the parameters. The initial parameters were set as follows: The initial learning rate was 0.01, the training epoch was 350, the weight decay rate was 0.000 484, and the momentum factor was 0.937. A total of 3 000 images were collected in the natural environment, including young fruits in different periods and different interference factors, with abundant samples. Four indexes were selected to evaluate the detection of models in the experiments, including the recall rate, F1 score, and average precision. 1 920 images of the dataset were trained, where the average precision of network was 96.9% on 600 test set images, 6.9 percentage points, 1.5 percentage points, and 0.2 percentage points higher than that of SSD, Faster R-CNN, and YOLOv4 models, respectively. The size of the YOLOv4-SENL model was 69 M larger than that of the SSD model, 59 M smaller than that of the Faster R-CNN model, and 11M larger than that of the YOLOv4 model. It indicated that the detection of young apple objects was accurately realized. The ablation experiment on 480 validation set images showed that only retaining the SE block in YOLOv4-SENL, the precision of the model was improved by 3.8 percentage points, compared with the YOLOv4 model. Only retaining three NL block visual attention modules in YOLOv4-SENL, the precision of the model was improved by 2.7 percentage points, compared with the YOLOv4 model. When replacing the SE and NL blocks in YOLOv4-SENL, the precision of model was improved by 4.1 percentage points, compared with the YOLOv4 model. These indicated that two visual attention mechanisms contributed to significantly improving the perception of network for young apples with a small increase in parameters. This finding can provide a potential reference to obtain the growth information in fruit breeding.

Keywords:	machine vision nondestructive detection young apples fruit detection YOLOv4 convolutional neural network visual attention mechanism
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《农业工程学报》浏览原始摘要信息
	点击此处可从《农业工程学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏