首页 | 本学科首页   官方微博 | 高级检索  
     检索      

面向知识图谱构建的水产动物疾病诊治命名实体识别
引用本文:刘巨升,杨惠宁,孙哲涛,杨鹤,邵立铭,于红,张思佳,叶仕根.面向知识图谱构建的水产动物疾病诊治命名实体识别[J].农业工程学报,2022,38(7):210-217.
作者姓名:刘巨升  杨惠宁  孙哲涛  杨鹤  邵立铭  于红  张思佳  叶仕根
作者单位:1. 大连海洋大学信息工程学院,大连 116023; 2. 设施渔业教育部重点实验室<大连海洋大学>,大连 116023; 3. 辽宁省海洋信息技术重点实验室,大连 116023;;4. 大连海洋大学水产与生命学院,大连 116023;
基金项目:设施渔业教育部重点实验室开放课题(2021-MOEKLECA-KF-05);国家自然科学基金青年科学基金项目(61802046)
摘    要:疾病诊治是水产动物健康养殖工程的重要支撑,知识图谱是水产动物疾病诊治知识表示及应用的有效手段,命名实体识别是构建水产动物疾病诊治知识图谱的关键。针对一词多义、实体嵌套等导致的水产动物疾病诊治命名实体识别准确率不高的问题,该研究提出了融合BERT(Bidirectional Encoder Representations from Transformers)与CaBiLSTM (Cascade Bi-directional Long Short-Term Memory)的实体识别模型。首先,建立水产动物疾病诊治专用语料库,并利用语料库中的数据对设计的模型进行训练;其次,采用"分层思想"设计CaBiLSTM模型进行嵌套实体识别,用降维的内层实体特征提升外层实体的辨析度,并引入BERT模型增添实体位置信息;最后,为验证所提出方法的有效性进行对比试验。试验结果表明,提出的融合BERT与CaBiLSTM模型对水产动物疾病诊治命名实体识别准确率、召回率、F1值分别达到93.07%、92.85%、92.96%。研究表明,该模型能够有效解决水产动物疾病诊治命名实体识别过程中由于一词多义、实体嵌套等导致的识别准确率不高问题,可提高水产动物疾病诊治知识图谱的构建质量,促进水产健康养殖工程发展。

关 键 词:模型  水产养殖  知识图谱  命名实体识别  嵌套实体  BERT  BiLSTM
收稿时间:2021/9/27 0:00:00
修稿时间:2022/3/25 0:00:00

Named-entity recognition for the diagnosis and treatment of aquatic animal diseases using knowledge graph construction
Liu Jusheng,Yang Huining,Sun Zhetao,Yang He,Shao Liming,Yu Hong,Zhang Siji,Ye Shigen.Named-entity recognition for the diagnosis and treatment of aquatic animal diseases using knowledge graph construction[J].Transactions of the Chinese Society of Agricultural Engineering,2022,38(7):210-217.
Authors:Liu Jusheng  Yang Huining  Sun Zhetao  Yang He  Shao Liming  Yu Hong  Zhang Siji  Ye Shigen
Institution:1. College of Information Engineering, Dalian Ocean University, Dalian 116023, China; 2. Key Laboratory of Environment Controlled Aquaculture, Ministry of Education , Dalian 116023, China; 3. Key Laboratory of Marine Information Technology of Liaoning Province, Dalian 116023, China;; 4.College of Fisheries and Life, Dalian Ocean University, Dalian 116023, China;
Abstract:Disease diagnosis and treatment have been an important support for aquatic animal health in aquaculture. A knowledge graph can be an effective way to express and apply the knowledge on the aquatic animal disease diagnosis and treatment. Among them, the named entity recognition has been the key component to construct the knowledge graph of aquatic animal diseases, particularly on the polysemy and entity nesting. However, the low recognition accuracy of named entities has posed a great challenge to the diagnosis and treatment of aquatic animal diseases. In this study, a diagnosis and treatment of aquatic animal diseases named entity recognition was proposed using BERT+CaBiLSTM+CRF (Bidirectional Encoder Representations from Transformers+Cascade-Bi-directional Long Short-Term Memory+Conditional Random Field). Firstly, the feature of the BERT model contained the position vector information. The polysemy was effectively improved to distinguish the different meanings that were expressed by entities in different contexts. Secondly, the CaBiLSTM model was designed for the nested named entity recognition using "hierarchical thinking". The reason was that the inner entity in the nested entity of aquatic medicine greatly contributed to the recognition of the outer entity. First of all, the BiLSTM+CRF model was used to identify the inner entities that appeared frequently, and then the dimension reduction of the identified inner entity feature matrix was connected outer entity feature matrix to retain the complete inner entity feature information. After that, the BiLSTM+CRF model was used for the outer entity recognition to improve the discrimination of outer entities for the accurate recognition of outer entities. Finally, a comparative experiment was designed to verify the effectiveness of the proposed recognition. The test results show that the accuracy, recall, and F1 value of the named entity recognition task in the aquatic medicine using the BERT+CaBiLSTM+CRF model reached 93.07%, 92.85%, and 92.96%, respectively. The entity structure features were outstanding in terms of specific entity categories, due to the five types of non-nested entities, such as aquatic animal names, drug names, disease names, disease sites, and pathogens. For example, most aquatic animal names contained the radicals, such as "worm" and "fish". The radicalsand drug names were mostly composed of chemical elements, while the disease names were mostly ended with the word "disease", indicating a higher recognition accuracy than that in the nested entities. But in view of the outstanding nested structure of entities, the model performed better to identify the nested named entities, such as the clinical symptoms using the named entity recognition model integrating the BERT and CaBiLSTM designed by the "hierarchical idea". Higher recognition was achieved than before. The recognition accuracy, recall, and F1 value increased by 12.31, 12.76, and 12.53 percentage points, respectively. Therefore, the model can be expected to effectively improve the accuracy of entity recognition caused by ambiguity and entity nesting in the task of diagnosis and treatment of aquatic animal diseases named entity recognition. The finding can provide the potential support to construct the fisheries field knowledge graph, further promote the healthy aquaculture projects.
Keywords:models  aquaculture  knowledge graph  named entity recognition  nested entity  BERT  BiLSTM
点击此处可从《农业工程学报》浏览原始摘要信息
点击此处可从《农业工程学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号