首页 | 本学科首页   官方微博 | 高级检索  
     检索      

学术论文作者同名消歧方法研究进展
引用本文:王新,卢垚,袁雪,赵婉婧,陈莉,刘敏娟.学术论文作者同名消歧方法研究进展[J].农业图书情报学刊,2022,34(10):82-90.
作者姓名:王新  卢垚  袁雪  赵婉婧  陈莉  刘敏娟
作者单位:中国农业科学院农业信息研究所,北京 100081
基金项目:中国农业科学院农业信息研究所2022年科技创新工程“数字农科院3.0建设”(CAAS-ASTIP-2016-AII)
摘    要:目的/意义]调研近年来作者同名消歧相关研究,厘清发展脉络,为后续研究提供参考。方法/过程]使用Web of Science、Scopus、谷歌学术、ACM、IEEE、Elsevier、Springer、中国知网、维普数据库和万方数据库检索作者姓名消歧相关文献,选择其中46篇代表性文献进行综述。从数据对作者同名消歧方法的影响的角度审视、梳理相关研究的发展脉络。结果/结论]按照消歧任务所依据的数据特点将相关研究方法分为3类。随着技术的进步,深度学习方法得到广泛采用。相对于模型的改进,基于深度学习的特征学习和表示,对作者同名消歧算法效果的提高更为显著,同时,为充分利用数据中包含的各种信息,3类算法呈现出相互结合、互补增益的态势。从文献调研情况看,可以从增量消歧和跨语种消歧等角度开展后续研究。

关 键 词:知识组织  作者名消歧  人名消歧  
收稿时间:2021-11-22

A Survey of Author Name Disambiguation Techniques of Academic Papers
WANG Xin,LU Yao,YUAN Xue,ZHAO Wanjing,CHEN Li,LIU Minjuan.A Survey of Author Name Disambiguation Techniques of Academic Papers[J].Journal of Library and Information Sciences in Agriculture,2022,34(10):82-90.
Authors:WANG Xin  LU Yao  YUAN Xue  ZHAO Wanjing  CHEN Li  LIU Minjuan
Institution:Agricultural Information Institute of Chinese Academy of Agricultural Sciences, Beijing 100081
Abstract:Purpose/Significance] This paper investigates the research on author name disambiguation published in recent years, and reviews the development context of relevant research from the perspective of the impact of data on author name disambiguation methods, so as to provide reference for further research. Method/Process] The papers related to author name disambiguation were collected from English research databases such as Web of Science, Scopus, Google Academic, ACM Digital Library, IEEE Xplore, ScienceDirect, Scopus and Springer Link, and Chinese research databases such as CNKI, CQVIP and WANFANG. The search results cover the relevant papers published from 1998 to 2021. On the premise of giving consideration to authority, influence and novelty, 46 publicationswere selected for review. There are many types and structures of author name disambiguation data. For example, literature feature information is generally presented in unstructured text, and the extracted features can be stored and represented in two-dimensional tables; Citation information and interpersonal relationship are network relational data, which can be stored and represented by graphs, key value pairs or two-dimensional tables. The fundamental reason for different data structures lies in their semantic differences, but the data structure itself determines its applicable algorithm. According to the structure of characteristic data used in the author name disambiguation task and the different corresponding data processing algorithms, the relevant research is divided into three categories: 1) disambiguation method based on literature characteristics, 2) disambiguation method based on social network and 3) disambiguation method by integrating external knowledge. The impact of data on the author name disambiguation method is examined from the data level. Results/Conclusions] The analysis found that with the progress of technology, deep learning methods have been widely used. Compared with the improvement of the model, the feature learning and representation based on deep learning can significantly improve the effect of the author name disambiguation algorithm. In addition, in order to overcome the problem of insufficient data utilization by a single method and improve the utilization efficiency of data, the three methods show the trend of mutual combination and complementary gain. From the literature research results, there are few related studies on incremental author name disambiguation and multi-language author name disambiguation, which could be one of the directions for further research.
Keywords:knowledge organization  author name disambiguation  person name disambiguation  
点击此处可从《农业图书情报学刊》浏览原始摘要信息
点击此处可从《农业图书情报学刊》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号