首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于改进YOLO-Pose的复杂环境下拖拉机驾驶员关键点检测
引用本文:徐红梅,杨浩,李亚林,张文杰,赵亚兵,吴擎.基于改进YOLO-Pose的复杂环境下拖拉机驾驶员关键点检测[J].农业工程学报,2023,39(16):139-149.
作者姓名:徐红梅  杨浩  李亚林  张文杰  赵亚兵  吴擎
作者单位:华中农业大学工学院,武汉 430070;农业农村部长江中下游农业装备重点实验室,武汉 430070
基金项目:国家自然科学基金面上项目(52175232)
摘    要:为解决农田复杂作业环境下拖拉机驾驶员因光照、背景及遮挡造成的关键点漏检、误检等难识别问题,该研究提出了一种基于改进YOLO-Pose的复杂环境下驾驶员关键点检测方法。首先,在主干网络的顶层C3模块中嵌入Swin Transformer编码器,提高遮挡状况下关键点的检测效率。其次,采用高效层聚合网络RepGFPN作为颈部网络,通过融合高层语义信息和低层空间信息,增强多尺度检测能力,同时在颈部网络采用金字塔卷积替换标准3×3卷积,在减少模型参数量的同时有效地捕获不同层级的特征信息。最后,嵌入坐标注意力机制优化关键点解耦头,增强预测过程对关键点空间位置的敏感程度。试验结果表明,改进后算法mAP0.5(目标关键点相似度Loks阈值取0.5时平均精度均值)为89.59%,mAP0.5:0.95(目标关键点相似度Loks阈值取0.5,0.55,···,0.95时的平均精度均值)为62.58%,相比于基线模型分别提高了4.24和4.15个百分点,单张图像平均检测时间为21.9 ms,与当前主流关键点检测网络Hou...

关 键 词:拖拉机  深度学习  检测  驾驶员  YOLO-Pose  关键点
收稿时间:2023/5/18 0:00:00
修稿时间:2023/8/7 0:00:00

Detecting the key points of tractor drivers under complex environments using improved YOLO-Pose
XU Hongmei,YANG Hao,LI Yalin,ZHANG Wenjie,ZHAO Yabing,WU Qing.Detecting the key points of tractor drivers under complex environments using improved YOLO-Pose[J].Transactions of the Chinese Society of Agricultural Engineering,2023,39(16):139-149.
Authors:XU Hongmei  YANG Hao  LI Yalin  ZHANG Wenjie  ZHAO Yabing  WU Qing
Institution:College of Engineering, Huazhong Agricultural University, Wuhan 430070, China;Key Laboratory of Agricultural Equipment in Mid-lower Reaches of the Yangtze River, Ministry of Agriculture and Rural Affairs, Wuhan 430070 China
Abstract:Key point leakage and misdetection have posed a great challenge on the recognition of tractor driver, due to the light, background, and occlusion in the complex operating environment of farmland. In this study, a joint driver-key point detection was proposed using improved YOLO-Pose. Firstly, Swin Transformer encoder was introduced in the top layer C3 module of the backbone network CSPDarkNet53. Among them, the encoder window size was set as 8, and the number of self-attention computation heads was 16. Swin Transformer encoder was used the self-attention of shifted windows (SW-MSA) computation to learn the cross-window interactions. The masking mechanism was utilized to isolate the invalid information exchange between pixels in non-adjacent regions in the original feature map. The better performance was achieved in the dense prediction and high-resolution vision, compared with the traditional ViT architecture. The improved model was obtained to effectively capture the global dependencies with the high computational efficiency. The global modelling capability was then improved the detection efficiency of key point under the occlusion condition. Secondly, RepGFPN, an efficient layer aggregation network with hopping structure and cross-scale connectivity, was adopted as the neck network, where the P6 detection layer was additionally added into the multi-scale output of the backbone network. CspStage module was adopted with the reparameterized ideas and layer aggregation connectivity to fuse the high-level semantic information and the low-layer spatial information, in order to enhance the model multi-scale detection. Thirdly, the pyramid convolution was introduced with 4-layer pyramid structure to replace the standard 3×3 convolution, in order to further optimize the neck network. The bottom-up layer-by-layer increasing convolution kernel was utilized to adaptively adjust the receptive field in the pyramid convolution. The number of model parameters was reduced to effectively capture the feature information of different layers. Finally, the decoupling head of key point was optimized to embed the coordinate attention mechanism, and then encode the horizontal and vertical position information into the channel attention. The network was obtained to acquire the cross-channel information, and then capture the direction-aware and position-sensitive information. The better capture performance was also achieved in the positional relationship between key points in the prediction process, indicating the high detection accuracy of the key points in the complex environments. The experimental results show that the improved model shared a mean average precision (mean average precision, mAP0.5) of 89.59%, when the Loks (object keypoint similarity) threshold was taken as 0.5, and a mAP of 0.5:0.95 (Loks thresholds were taken as 0.5, 0.55,..., 0.95, when the mean average precision) was 62.58%, which was 4.24 and 4.15 percentage points higher than the baseline model, respectively, and the average detection time of a single image was 21.9 ms. Furthermore, the mAP0.5 was improved by 7.94, 5.27, and 2.66 percentage points, and the model size was reduced by 257.5, 8.2, and 9.3 M, respectively, compared with the current mainstream networks of key point detection, such as Hourglass, HRNet-W32, and DEKR. The improved detection of key point presented the high detection accuracy and inference speed in complex scenes, especially in the case of the driver''s presence of self-obscuring and other-object-obscuring. The finding can provide a strong theoretical basis for the driver behavior recognition and state monitoring in farmland operation environment.
Keywords:tractor  deep learning  detection  driver  YOLO-Pose  keypoint
点击此处可从《农业工程学报》浏览原始摘要信息
点击此处可从《农业工程学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号