首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于深度学习的水产病害可视化知识图谱构建与验证
引用本文:姜丽华,赵瑞雪,董春岩,常晓燕,马娟娟,谢能付,方松.基于深度学习的水产病害可视化知识图谱构建与验证[J].农业工程学报,2023,39(15):259-267.
作者姓名:姜丽华  赵瑞雪  董春岩  常晓燕  马娟娟  谢能付  方松
作者单位:中国农业科学院农业信息研究所, 北京 100081;农业农村部农业大数据重点实验室, 北京 100081;农业农村部信息中心, 北京 100025;北京航天丰益信息技术有限公司, 北京 100081;中国农业科学院, 北京 100081
基金项目:科技创新2030—“新一代人工智能”重大项目(2021ZD0113702-02);国家社科基金项目“中国农业科技政策扩散及路径优化研究”(20CTQ019)
摘    要:知识图谱本质上是基于图的语义网络,表示实体与实体之间的关系,在知识问答、语义检索等领域起着至关重要的作用。针对目前水产病害领域存在实体关系交叉关联、多源异构数据聚合能力差、利用率低、知识共享困难等问题,该研究基于自然语言处理和文本挖掘提出了一个基于神经网络深度学习模型的水产病害专业领域知识图谱构建方法并进行试验验证。首先,构建水产病害专业领域本体,并预定义实体类型、属性和关系的集合,确定知识抽取边界;其次,在本体基础上,分别利用规则方法和深度学习方法对半结构化和非结构化知识进行抽取。对于非结构化知识,提出“水产病害+关系+BMES”文本标注体系,将关系抽取融合于命名实体识别任务中直接对三元组建模,将实体关系抽取转化为序列标注问题,不仅提高标注效率,还实现了实体和关系的联合抽取。同时通过标签匹配和映射对三元组建模获得RDF数据,解决了重叠关系抽取的难题。利用BERT-BiLSTM+CRF端到端模型进行试验,试验结果证明该三元组抽取方法具有较高的召回率(89.64%),准确率(94.04%)和F1值(91.34%),优于CNN+BiLSTM+CRF和BiLSTM+CRF等模型,抽取效果有了显著提升,并将抽取到的知识存储到 Neo4j 图数据库中,实现知识可视化管理及知识推理分析。该研究构建的水产病害知识图谱精度高、粒度细,能够帮助机器理解数据、解释现象、知识推理,从而发掘深层关系、实现智慧搜索与智能交互。

关 键 词:知识图谱  深度学习  水产病害  本体  BiLSTM+CRF
收稿时间:2023/4/14 0:00:00
修稿时间:2023/6/9 0:00:00

Construction and verification of the visual knowledge map of aquatic diseases based on deep learning
JIANG Lihu,ZHAO Ruixue,DONG Chunyan,CHANG Xiaoyan,MA Juanjuan,XIE Nengfu,FANG Song.Construction and verification of the visual knowledge map of aquatic diseases based on deep learning[J].Transactions of the Chinese Society of Agricultural Engineering,2023,39(15):259-267.
Authors:JIANG Lihu  ZHAO Ruixue  DONG Chunyan  CHANG Xiaoyan  MA Juanjuan  XIE Nengfu  FANG Song
Institution:Institute of Agricultural Information, Chinese Academy of Agricultural Sciences, Beijing 100081, China;Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Beijing 100081, China;Information Center, Ministry of Agriculture and Rural Affairs, Beijing 100025, China;Beijing Aerospace Fengyi Information Technology Co. LTD, Beijing 100081, China
Abstract:Various aquatic diseases occur frequently in recent years, particularly with the rapid development of the aquaculture industry. Once these diseases spread very rapidly, a very serious risk can be put on aquaculture. Furthermore, the network data of aquatic diseases also presents highly dispersed, multi-source heterogeneous features with the development of Internet technology. It is a high demand to rapidly and accurately obtain the required information, due to the explosive growth of network data. However, traditional information acquisition cannot fully meet the search engines. The retrieval keywords or shallow semantic analysis can bring a large number of related web links, leading to vague and redundant answers. The intelligent Q&A system can be selected to support the users'' natural language input, and then accurately capture the user intent, finally returning concise and accurate answers. Among them, the emergence and rapid development of knowledge graphs can provide a high-quality knowledge base for intelligent question-answering systems, in order to promote the application of question-answering systems in various fields. The knowledge graph construction can be divided mainly into four steps: data acquisition, ontology construction, knowledge extraction and storage. Firstly, the crawler technology is used to obtain the relevant aquatic disease data, and then data preprocessing can be performed, including data cleaning and analysis. Secondly, the aquatic diseases ontology can be constructed using the data content and representation characteristics, in order to predefine the relations and properties types between entities. As such, the boundaries of knowledge extraction are clarified during this time. Secondly, the rule logic can be used to extract the semi-structured data. The entity and relation joint extraction is then used to extract unstructured data. Finally, the extracted triple data is stored in the Neo4j graph database, in order to realize the visual management of the knowledge graph and a certain degree of knowledge reasoning. In this study, a new text annotation system of "aquatic disease + relationship +BMES" was proposed in the unstructured knowledge. The relationship extraction was also integrated into the named entity recognition task. Then, the ternary model was directly constructed to transform the entity relationship extraction into sequence annotation, in order to improve the annotation efficiency at least twice for the joint extraction of entity and relationship. At the same time, the triplet data was obtained for the triplet-building module using label matching and mapping. The overlapping relation extraction was then solved in this case. The BERT-BiLSTM+CRF end-to-end model was used to carry out the test. The test results showed that the triad extraction shared a high recall rate (89.64%), accuracy (94.04%), and F1 (91.34%), which was significantly better than CNN+BiLSTM+CRF and BiLSTM+CRF models. The extracted knowledge was stored in the Neo4j graph database, and then realized knowledge visualization management and knowledge reasoning analysis. The aquatic disease knowledge map presented high precision and fine granularity. The finding can provide a new idea for the field of intelligent Q&A. Anyway, the semi-automatic construction of a knowledge graph can also offer technical support for the recommendation system, knowledge base construction, search and application knowledge base construction.
Keywords:knowledge graph  ontology  deep learning  fish diseases  BiLSTM+CRF
点击此处可从《农业工程学报》浏览原始摘要信息
点击此处可从《农业工程学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号