首页 | 本学科首页   官方微博 | 高级检索  
     

面向葡萄知识图谱构建的多特征融合命名实体识别
引用本文:聂啸林,张礼麟,牛当当,吴华瑞,朱华吉,张宏鸣. 面向葡萄知识图谱构建的多特征融合命名实体识别[J]. 农业工程学报, 2024, 40(3): 201-210
作者姓名:聂啸林  张礼麟  牛当当  吴华瑞  朱华吉  张宏鸣
作者单位:西北农林科技大学信息工程学院,杨凌 712100;西北农林科技大学信息工程学院,杨凌 712100;陕西省农业信息感知与智能服务重点实验室,杨凌 712100;北京市农林科学院信息技术研究中心,北京 100097;国家农业信息化工程技术研究中心,北京 100097
基金项目:国家重点研发计划项目(2020YFD1100601);陕西省重点研发计划项目(2023-YBNY-217);陕西省秦创原队伍建设项目(2023-ZDLNY-69);
摘    要:为解决构建知识图谱过程中由于上下文环境复杂、现有模型字向量语义表征相对单一导致领域专业实体识别率低的问题,该研究提出了来自转换器的双向编码器表征量(bi-directional encoder representation from transformer, BERT)和残差结构(residual structure, RS)融合的命名实体识别模型(bert based named entity recognition with residual structure,BBNER-RS)。通过BERT模型将文本映射为字符向量,利用双向长短时记忆网络(bi-directional long-short term memory, BiLSTM)提取局部字符向量特征,并采用RS保留BERT提供的全局字符向量特征,以提高字向量的语义丰富度,最后通过条件随机场(conditional random field, CRF)模型对特征向量解码,获取全局最优序列标注。与其他命名实体识别模型相比,提出的BBNER-MRS模型在葡萄数据集上表现较好,在葡萄人民日报、玻森、简历和微博数据集上F1值分别达到89...

关 键 词:信息化  深度学习  知识图谱  命名实体识别  BERT  残差结构
收稿时间:2023-06-19
修稿时间:2024-01-22

Multi-feature fusion named entity recognition method for grape knowledge graph construction
NIE Xiaolin,ZHANG Lilin,NIU Dangdang,WU Huarui,ZHU Huaji,ZHANG Hongming. Multi-feature fusion named entity recognition method for grape knowledge graph construction[J]. Transactions of the Chinese Society of Agricultural Engineering, 2024, 40(3): 201-210
Authors:NIE Xiaolin  ZHANG Lilin  NIU Dangdang  WU Huarui  ZHU Huaji  ZHANG Hongming
Affiliation:College of Information Engineering, Northwest A & F University, Yangling, 712100, China;College of Information Engineering, Northwest A & F University, Yangling, 712100, China;Shaanxi Key Laboratory of Agricultural Information Perception and Intelligent Service, Yangling, 712100, China;Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China;National Engineering Research Center for Agricultural Information Technology, Beijing, 100097, China
Abstract:Domain knowledge graph can store the data with structured and fine-grained features, and model the real world in the form of triple groups. Dispersed knowledge can be effectively organized and then widely used in the fields of healthcare, finance, and the Internet. Alternatively, the grape is one of the most important economic fruits in agriculture. However, there is a large amount of unstructured knowledge in the grape domain, limiting the downstream data-driven task use. Current knowledge graphs are also rare in the agricultural domain. It is very necessary to construct a knowledge graph in the grape domain, particularly for knowledge storage and sharing. Furthermore, the key information is often implicit in the complex contextual environment, when constructing domain knowledge graphs. The character vector semantic representations of existing named entity recognition (NER) models are relatively homogeneous, leading to a low recognition rate of domain-specialized entities, and ultimately affect the efficiency and quality of knowledge graph construction. In this study, a named entity recognition model was proposed using the fusion of Bi-directional Encoder Representation from Transformer (BERT) and Residual Structure (RS). Firstly, the raw text was mapped into the character vectors using BERT. The input sentences were then embedded in BERT using token, segment and position embedding. In the subsequent embedded vectors, a distinctive Multi-head Attention mechanism was utilized to calculate the correlation between the current character and other characters in the sentence. This calculation allowed for the adjustment of their weights, thereby endowing the character vectors provided by BERT with global characteristics. In the Bi-directional Long-Short Term Memory (BiLSTM), the character vectors provided by BERT were obtained from the deep-layered local features in both forward and backward directions. Two simple but effective residual structures were designed to optimize the global features provided by BERT and the deep local feature provided by BiLSTM. The mapping residual structure was used to map the feature vectors provided by the BERT in a reduced dimension, in order to preserve as much of the original information of the BERT as possible, while the convolution residual structure convolved the feature vectors twice to obtain more information. The feature vectors were decoded by a Conditional Random Field (CRF) model. Compared with the rest, the Bert-based Named Entity Recognition with Residual Structure (BBNER-RS) model performed the best on the grape dataset with an F1 value of 89.93%. At the same time, the model shared some generalization, with the F1 values of 95.02%, 83.21%, 96.15% and 72.51% on the People Daily, BOSON, RESUME and Weibo datasets, respectively. A two-stage deep learning-based domain knowledge graph construction was proposed, i.e., in the first stage, a domain ontology was constructed, and in the second stage, a deep learning model was utilized to extract knowledge under the constraints of the ontology and construct triple groups. The BBNER-RS performed the best when constructing triple groups from unstructured text with an F1 value of 86.44%. Finally, the BBNER-RS was used to successfully construct a grape knowledge graph. The finding can provide technical and data support to the standardization and sharing of domain data.
Keywords:informatization  deep learning  knowledge graph  named entity recognition  BERT  residual structure
点击此处可从《农业工程学报》浏览原始摘要信息
点击此处可从《农业工程学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号