首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于主题N元语法模型的科技报告主题分析
引用本文:安欣,徐硕.基于主题N元语法模型的科技报告主题分析[J].农业图书情报学刊,2019(6):21-30.
作者姓名:安欣  徐硕
作者单位:北京林业大学经济与管理学院;北京工业大学经济与管理学院北京现代制造业发展研究基地
基金项目:广东省自然科学基金项目“面向生物医药领域的前沿技术预判方法论与模型构建研究”(项目编号:2018A030313695)
摘    要:作为科技情报的重要载体之一,科技报告可以反映科技发展的脉络,可以揭示科技前沿的动态,甚至可以洞察科技发展的趋势等。中国科技报告的开发利用研究目前主要集中在书本型科技报告或电子出版物的出版发行、数据库建设、服务方式和知识产权等方面,在深度数据挖掘方面的研究工作相对较少。笔者尝试利用主题N元语法模型对科技报告进行领域深层主题分析,为了确定特定领域科技报告的主题数目,笔者借助动态规划的思想针对主题N元语法模型提出了困惑度的有效计算方法。最后,以肿瘤领域1344条科技报告为实验数据,揭示了以“分子机制/肿瘤细胞”和“系统生物学/关键方法”为代表的70个主题,验证了利用主题N元语法模型揭示科技报告领域深层主题的可行性和有效性。

关 键 词:科技报告  主题N元语法模型  主题分析  困惑度  热力图

Topical Analysis of Scientific and Technical Reports based on Topical N-Grams Model
AN Xin,XU Shuo.Topical Analysis of Scientific and Technical Reports based on Topical N-Grams Model[J].Journal of Library and Information Sciences in Agriculture,2019(6):21-30.
Authors:AN Xin  XU Shuo
Institution:(School of Economics and Management,Beijing Forestry University,Beijing 100083,China;Research Base of Beijing Modern Manufacturing Development,College of Economics and Management,Beijing University of Technology,Beijing 100124,China)
Abstract:As one of the important carriers of scientific & technical (S&T) intelligence, S&T reports can reflect the line of S&T development, recover the latest news of S&T fronts, and even insight the trends of S&T development. Researches on developing and utilizing S&T reports in our country mainly focus on the following: publication and distribution of S&T reports in the form of book and electrical publication;database construction;service mode;intelligent property and so on. The deep data mining on S&T reports remains largely under-studied. This work tries to discover the domain latent topics of S&T reports with the topical n-grams model. In order to determine the number of topics of S&T reports for some specific domain, the calculation method of perplexity of the topic n-grams model is put forward with the dynamic programming in this study. Finally, 70 domain topics are discovered from 1 344 S&T reports in the tumor domain, such as "molecular mechanisms/tumor cells","system biology/key methods" and so on. Experimental results show that it is feasible and efficient to discover the latent topics from S&T reports with the topical n-grams model.
Keywords:scientific and technical reports  topical n-grams model  topical analysis  perplexity  heat map
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号