首页 | 本学科首页   官方微博 | 高级检索  
     检索      

融合2D-3D卷积特征识别哺乳母猪姿态转换
引用本文:薛月菊,李诗梅,郑婵,甘海明,李程鹏,刘洪山.融合2D-3D卷积特征识别哺乳母猪姿态转换[J].农业工程学报,2021,37(9):230-237.
作者姓名:薛月菊  李诗梅  郑婵  甘海明  李程鹏  刘洪山
作者单位:1. 华南农业大学电子工程学院,广州 510642;;2. 华南农业大学数学与信息学院,广州 510642;
基金项目:广州市科技计划项目(201604016122),广东省普通高校重点领域专项(2020ZDZX1041),广东省科技计划项目(2021A0505030058)
摘    要:母猪姿态转换影响仔猪存活率,且动作幅度与持续时间存在差异,准确识别难度大。该研究提出一种融合2D-3D卷积特征的卷积网络(2D+3D-CNet,2D+3D Convolutional Network)识别深度图像母猪姿态转换。以视频段为输入,引入注意力机制SE模块和3D空洞卷积,以提升3D卷积网络姿态转换的时空特征提取能力,用2D卷积提取母猪的空间特征;经特征融合后,动作识别分支输出母猪转换概率,姿态分类分支输出4类姿态概率,结合这两个输出结果识别8类姿态转换,减少了人工标注数据集的工作量;最后设计动作分数,优化母猪姿态转换的时间定位。在测试集上,2D+3D-CNet姿态转换识别精度为97.95%、召回率为91.67%、测试速度为14.39帧/s,精度、召回率和时间定位精度均高于YOWO、FRCNN-HMM和MOC-D方法。该研究结果实现了母猪姿态转换高精度识别。

关 键 词:神经网络:卷积  时空特征融合  动作识别  姿态转换  时间定位
收稿时间:2021/1/1 0:00:00
修稿时间:2021/3/16 0:00:00

Posture change recognition of lactating sow by using 2D-3D convolution feature fusion
Xue Yueju,Li Shimei,Zheng Chan,Gan Haiming,Li Chengpeng,Liu Hongshan.Posture change recognition of lactating sow by using 2D-3D convolution feature fusion[J].Transactions of the Chinese Society of Agricultural Engineering,2021,37(9):230-237.
Authors:Xue Yueju  Li Shimei  Zheng Chan  Gan Haiming  Li Chengpeng  Liu Hongshan
Institution:1. College of Electronic Engineering, South China Agricultural University, Guangzhou 510642, China;;2. College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China;
Abstract:Posture change of lactating sow directly determines the preweaning survival rate of piglets. Automated recognition of sow posture change can make early warning possible to improve the survival rate of piglets. The frequency, type, and duration of sow posture changes can be expected to select the sows with high maternal quality as breeding pigs. But it is difficult to accurately recognize actions of sow posture change, due to the variety of posture changes, as well as the differences of range and duration of the movement. In this study, a convolutional network (2D+3D-CNet, 2D+3D convolutional Network) coupled with 2D-3D convolution feature fusion was proposed to recognize actions of sow posture change in-depth images. Experimental data was collected from a commercial pig farm in Foshan City, Guangdong Province of South China. A Kinect 2.0 camera was fixed directly above the pen to record daily activities of sows with a top view and a video frame of 5 fps. RGB-D video collection was conducted with a depth image resolution of 512×424 pixels. Median filtering and histogram equalization were used to process the dataset. The video clips were then fed into 2D+3D-CNet for training and testing. 2D+3D-CNet included spatiotemporal and spatial feature extraction, feature fusion, action recognition, and postures classification. This approach was adopted to fully integrate the video-level action recognition and frame-level posture classification. Firstly, 16-frame video clips were fed into the network, and then 3D ResNeXt-50 and Darknet-53 were used to extract the spatiotemporal and spatial features during sow movement. A SE module was added to the residual network structure of 3D ResNeXt-50, named 3D SE- ResNeXt-50, to boost the representation power of the network. The sow bounding box and the probability of posture changes were generated from the action recognition after feature fusion. The sow bounding box was then mapped to Darknet-53, where the 13th convolutional layer feature was processed for the sow regional feature maps. Next, the sow regional feature maps were fed into postures classification to finally obtain four probabilities of the posture. Considering the spatiotemporal motion and inter-frame postures variation during sow posture change, the action score was designed to indicate the possibility of posture change, and the threshold was set to determine the start and end time of a posture change action of a sow. Since the start and end time were determined, the specific posture change was classified via combining with the posture of sow one second before the start time, and one second after the end time. The method can be expected to directly recognize a specific posture change action of sow without a large number of datasets to be collected and annotated. The 2D+3D-CNet model was trained using PyTorch deep learning framework on an NVIDIA RTX 2080Ti GPU (graphics processing units), while the algorithm was developed on Ubuntu 16.04 platform. The performance of the algorithm was evaluated on the test set. The classification accuracies of lateral lying, standing, sitting, and ventral lying were 100%, 98.69%, 98.24%, and 98.19%, respectively. The total recognition accuracy of sow posture change actions was 97.95%, while the total recall rate was 91.67%, and the inference speed was 14.39 frames/s. The accuracies increased by 4.28%, 5.06%, and 5.53%, and the recall rate increased by 3.83%, 3.65%, and 5.90%, respectively, compared with Faster RCNN-HMM, YOWO, and MOC-D. The presented method can remove hand-crafted features to achieve real-time inference and more accurate action localization.
Keywords:neural networks  convolution  spatiotemporal feature fusion  action recognition  posture change    temporal localization
本文献已被 CNKI 等数据库收录!
点击此处可从《农业工程学报》浏览原始摘要信息
点击此处可从《农业工程学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号