首页 | 本学科首页   官方微博 | 高级检索  
     

基于连续语音识别技术的猪连续咳嗽声识别
引用本文:黎煊,赵建,高云,刘望宏,雷明刚,谭鹤群. 基于连续语音识别技术的猪连续咳嗽声识别[J]. 农业工程学报, 2019, 35(6): 174-180
作者姓名:黎煊  赵建  高云  刘望宏  雷明刚  谭鹤群
作者单位:1. 华中农业大学工学院,武汉 430070;2. 生猪健康养殖协同创新中心,武汉 430070;,1. 华中农业大学工学院,武汉 430070;2. 生猪健康养殖协同创新中心,武汉 430070;,1. 华中农业大学工学院,武汉 430070;2. 生猪健康养殖协同创新中心,武汉 430070;,2. 生猪健康养殖协同创新中心,武汉 430070;3. 华中农业大学动物科技学院动物医学院,武汉 430070,2. 生猪健康养殖协同创新中心,武汉 430070;3. 华中农业大学动物科技学院动物医学院,武汉 430070,1. 华中农业大学工学院,武汉 430070;2. 生猪健康养殖协同创新中心,武汉 430070;
基金项目:国家重点研发计划项目(2018YFD0500700);华中农业大学自主科技创新基金;华中农业大学大北农青年学者提升专项项目(2017DBN005);现代农业产业技术体系项目(CARS-36);国家级大学生创新创业训练计划(201810504074)
摘    要:针对现有基于孤立词识别技术的猪咳嗽声识别存在识别声音种类有限,无法反映实际患病猪连续咳嗽的问题,该文提出了基于双向长短时记忆网络-连接时序分类模型(birectional long short-termmemory-connectionist temporal classification,BLSTM-CTC)构建猪声音声学模型,进行猪场环境猪连续咳嗽声识别的方法,以此进行猪早期呼吸道疾病的预警和判断。研究了体质量为75 kg左右长白猪单个咳嗽声样本的持续时间长度和能量大小的时域特征,构建了声音样本持续时间在0.24~0.74 s和能量大于40.15 V~2·s的阈值范围。在此阈值范围内,利用单参数双门限端点检测算法对基于多窗谱的心理声学语音增强算法处理后的30 h猪场声音进行检测,得到222段试验语料。将猪场环境下的声音分为猪咳嗽声和非猪咳嗽声,并以此作为声学模型建模单元,进行语料的标注。提取26维梅尔频率倒谱系数(Mel frequency cepstral coefficients,MFCC)作为试验语段特征参数。通过BLSTM网络学习猪连续声音的变化规律,并利用CTC实现了端到端的猪连续声音识别系统。5折交叉验证试验平均猪咳嗽声识别率达到92.40%,误识别率为3.55%,总识别率达到93.77%。同时,以数据集外1 h语料进行了算法应用测试,得到猪咳嗽声识别率为94.23%,误识别率为9.09%,总识别率为93.24%。表明基于连续语音识别技术的BLSTM-CTC猪咳嗽声识别模型是稳定可靠的。该研究可为生猪健康养殖过程中猪连续咳嗽声的识别和疾病判断提参考。

关 键 词:信号处理;声音信号;识别;生猪产业;连续咳嗽声;双向长短时记忆网络-连接时序分类模型;声学模型
收稿时间:2018-11-09
修稿时间:2019-01-13

Pig continuous cough sound recognition based on continuous speech recognition technology
Li Xuan,Zhao Jian,Gao Yun,Liu Wanghong,Lei Minggang and Tan Hequn. Pig continuous cough sound recognition based on continuous speech recognition technology[J]. Transactions of the Chinese Society of Agricultural Engineering, 2019, 35(6): 174-180
Authors:Li Xuan  Zhao Jian  Gao Yun  Liu Wanghong  Lei Minggang  Tan Hequn
Affiliation:1. College of Engineering, Huazhong Agricultural University, Wuhan 430070, China;2. Cooperative Innovation Center for Sustainable Pig Production, Wuhan 430070, China;,1. College of Engineering, Huazhong Agricultural University, Wuhan 430070, China;2. Cooperative Innovation Center for Sustainable Pig Production, Wuhan 430070, China;,1. College of Engineering, Huazhong Agricultural University, Wuhan 430070, China;2. Cooperative Innovation Center for Sustainable Pig Production, Wuhan 430070, China;,2. Cooperative Innovation Center for Sustainable Pig Production, Wuhan 430070, China;3. College of Animal Science and Technology, College of Animal Medicine, Huazhong Agricultural University, Wuhan 430070,China,2. Cooperative Innovation Center for Sustainable Pig Production, Wuhan 430070, China;3. College of Animal Science and Technology, College of Animal Medicine, Huazhong Agricultural University, Wuhan 430070,China and 1. College of Engineering, Huazhong Agricultural University, Wuhan 430070, China;2. Cooperative Innovation Center for Sustainable Pig Production, Wuhan 430070, China;
Abstract:Abstract: Cough is one of the most frequent symptoms in the early stage of pig respiratory diseases. So it is possible to monitor and diagnose the diseases of pigs by detecting their coughs. The existing methods for pig cough recognition are based on key word recognition technology, which cannot recognize the samples that have not been trained or learned by itself, another drawback is that the methods are for isolated coughs while the coughs of sick pigs are usually continuous. This paper intends to realize the recognition of pig continuous cough sound based on continuous speech recognition technology. Ten Landrace pigs, with a body weight of about 75 kg, were used as sound collection objects, and pig sounds were collected in pig farms during late winter and early spring when the respiratory diseases of pigs were prevalent. The sound collection devices were working continuously all day. By selecting the frequent coughing phases in the collected signal, a total of 30 h pig farm sound signals were obtained as the experimental corpus. Firstly, the sound signals were denoised by the speech enhancement algorithm based on a psychoacoustical model. Then the time-domain characteristics, including duration and energy of individual cough, were studied, and it was found that the duration of pig cough ranged from 0.24 to 0.74 s and the energy ranged from 40.15 to 822.87 V2·s. So threshold of the sound samples was set with the duration and the lower energy value of individual coughs. Based on the threshold range, the speech endpoint detection algorithm based on short-time energy was used to detect the 30 h pig field sound signals which had been preprocessed by the speech enhancement algorithm, and 222 experimental sentences were obtained. The longest was 9.14 s and the shortest was 3.91 s. All 222 corpus contained a total of 1 145 sound samples, including 751 pig coughs and 394 non-pig coughs. Sounds in the pig farm environment, including cough, sneeze, eating, scream, hum, shaking ears sounds of pigs and sounds of dogs, metal clanging and some other background noise, were divided into pig cough and non-pig cough, which were chosen as the acoustic modeling units. The labels of the experimental sentences were obtained with the help of experts. Then the 13-dimensional Mel frequency cepstrum coefficients (MFCC) reflecting the static characteristics of pig sound were extracted, and the first-order differential coefficients reflecting the dynamic characteristics of pig sound were added to obtain the 26-dimensional MFCC, which were used as the characteristic parameter of the experimental sentence. Finally, the bidirectional Long Short-term Memory-Connectionist temporal classification(BLSTM-CTC) model was selected to recognize the pig continuous sounds, specifically, the BLSTM network had excellent feature learning ability of continuous pig sounds, and the CTC could directly model the alignment of the input continuous pig sound sequence and its labels. Through the 5-fold cross-validation experiment and analysis, the number of hidden layer neurons in the BLSTM forward propagation process, the backward propagation process, and the fully connected layer, were all set to 300, and the learning rate was set to 0.001. The average recognition rate, error recognition rate and total recognition rate of the results of 5 groups were 92.40%, 3.55% and 93.77%, respectively. Furthermore, the algorithm application test was carried out with another 1 h data, and the recognition rate reached to 94.23%, the error recognition rate was 9.09% with the total recognition rate of 93.24%. It is indicated that the pig cough sound recognition model based on continuous speech recognition technology is stable and reliable. This paper provides a reference for the recognition and disease judgment of pig continuous cough sound during the healthy breeding of pigs.
Keywords:signal processing   acoustic signal   recognition   pig industry   continuous cough   birectional long short-term memory-connectionist temporal classification   acoustic model
本文献已被 CNKI 等数据库收录!
点击此处可从《农业工程学报》浏览原始摘要信息
点击此处可从《农业工程学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号