基于改进RetinaNet的果园复杂环境下苹果检测 Apple detection in complex orchard environment based on improved RetinaNet期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于改进RetinaNet的果园复杂环境下苹果检测

引用本文：	孙俊,钱磊,朱伟栋,周鑫,戴春霞,武小红. 基于改进RetinaNet的果园复杂环境下苹果检测[J]. 农业工程学报, 2022, 38(15): 314-322

作者姓名：	孙俊钱磊朱伟栋周鑫戴春霞武小红

作者单位：	江苏大学电气信息工程学院，镇江 212013

基金项目：	江苏大学农业装备学部项目（NZXB20210210）；江苏高校优势学科建设工程（三期）资助项目（PAPD-2018-87）

摘要：	为了快速准确地检测重叠、遮挡等果园复杂环境下的苹果果实目标，该研究提出一种基于改进RetinaNet的苹果检测网络。首先，该网络在传统RetinaNet的骨干网络ResNet50中嵌入Res2Net模块，提高网络对苹果基础特征的提取能力；其次，采用加权双向特征金字塔网络（Bi-directional Feature Pyramid Network,BiFPN）对不同尺度的特征进行加权融合，提升对小目标和遮挡目标的召回率；最后，采用基于焦损失（Focal Loss）和高效交并比损失（Efficient Intersection over Union Loss,EIoU Loss）的联合损失函数对网络进行优化，提高网络的检测准确率。试验结果表明，改进的网络在测试集上对叶片遮挡、枝干/电线遮挡、果实遮挡和无遮挡的苹果检测精度分别为94.02%、86.74%、89.42%和94.84%，平均精度均值（meanAveragePrecision,mAP）达到91.26%，较传统RetinaNet提升了5.02个百分点，检测一张苹果图像耗时42.72 ms。与Faster-RCNN和YOLOv4等主...
关键词：	图像识别采摘机器人苹果检测 RetinaNet BiFPN EIoU 遮挡
收稿时间：	2022-03-09
修稿时间：	2022-07-20
Apple detection in complex orchard environment based on improved RetinaNet

Sun Jun,Qian Lei,Zhu Weidong,Zhou Xin,Dai Chunxi,Wu Xiaohong. Apple detection in complex orchard environment based on improved RetinaNet[J]. Transactions of the Chinese Society of Agricultural Engineering, 2022, 38(15): 314-322

Authors:	Sun Jun Qian Lei Zhu Weidong Zhou Xin Dai Chunxi Wu Xiaohong

Affiliation:	School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China

Abstract:	A fast and accurate detection is one of the most important prerequisites for the apple harvest robots. However, there are many factors that can make apple detection difficult in a real orchard scene, such as complex backgrounds, fruit overlap, and leaf/branch occlusion. In this study, a fast and stable network was proposed for apple detection using an improved RetinaNet. A picking strategy was also developed for the harvest robot. Specifically, once the apples occluded by branches/wires were regarded as the picking targets, the robot arm would be injured at the same time. Therefore, the apples were labeled with multiple classes, according to different types of occlusions. The Res2Net module was also embedded in the ResNet50, in order to improve the ability of the backbone network to extract the multi-scale features. Furthermore, the BiFPN instead of FPN was used as a feature fusion network in the neck of the network. A weight fusion of feature maps was also made at different scales for the apples with different sizes, thus improving the detection accuracy of the network. After that, a loss function was combined with the Focal loss and Efficient Intersection over Union (EIoU) loss. Among them, Focal loss was used for the classification loss function, further reducing the errors from the imbalance of positive and negative sample ratios. By contrast, the EIoU loss was used for the regression loss function of the bounding box, in order to maintain a fast and accurate regression. Particularly, there were some different relative positions in the prediction and the ground truth box, such as overlap, disjoint and inclusion. Finally, the classification and regression were carried out on the feature map of five scales to realize a better detection of apple. In addition, the original dataset consisted of 800 apple images with complex backgrounds of dense orchards. A data enhancement was conducted to promote the generalization ability of the model. The dataset was then expanded to 4800 images after operations, such as rotating, adjusting brightness, and adding noise. There was also a balance between the detection accuracy and speed. A series of experimental statistics were obtained on the number of BiFPN stacks in the network. Specifically, the BiFPN was stacked five times in the improved RetinaNet. The ablation experiments showed that each improvement of the model enhanced the accuracy of the network for the apple detection, compared with the original. The average precision of the improved RetinaNet reached 94.02%, 86.74%, 89.42%, and 94.84% for the leaf occlusion, branch/wire occlusion, fruit occlusion, and no occlusion apples, respectively. The mean Average Precision (mAP) reached 91.26%, which was 5.02% higher than that of the traditional RetinaNet. The improved RetinaNet took only 42.72 ms to process an apple image on average. Correspondingly, each fruit picking cycle was 2.78 s, indicating that the detection speed fully met the harsh requirement of the picking robot. Only when the apples were large or rarely occluded, both improved and traditional RetinaNet were used to accurately detect them. By contrast, the improved RetinaNet performed the best to detect all apple fruits, when the apples were under a complex environment in an orchard, such as the leaf-, fruit-, or branch/wire-occluded background. The reason was that the traditional RetinaNet often appeared to miss the detection in this case. Consequently, the best comprehensive performance was achieved to verify the effectiveness of the improvements, compared with the state-of-the-art detection network, such as the Faster RCNN and YOLOv4. Overall, all the apples in the different classes can be effectively detected for the apple harvest. The finding can greatly contribute to the picking strategy of the robot, further avoiding the potential damage by the branches and wires during harvesting.

Keywords:	image recognition picking robot apple detection RetinaNet BiFPN EIoU occlusion

	点击此处可从《农业工程学报》浏览原始摘要信息
	点击此处可从《农业工程学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏