首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于农业网络信息分类的热词自动提取方法
引用本文:段青玲,张璐,刘怡然,王沙沙.基于农业网络信息分类的热词自动提取方法[J].农业机械学报,2018,49(7):160-167.
作者姓名:段青玲  张璐  刘怡然  王沙沙
作者单位:中国农业大学信息与电气工程学院;北京农信通科技有限责任公司
基金项目:国家高技术研究发展计划(863计划)项目(2013AA102306)和“十二五”国家科技支撑计划项目(2012BAD35B06)
摘    要:热词提取对于监控和分析农业舆情具有重要意义,目前已有一定研究基础,但仍存在针对性差等问题,无法满足农业领域不同产业用户群的个性化需求,为此,提出一种基于农业网络信息分类的热词自动提取方法。首先采用多标记分类算法对文本语料进行分类,按分类类别构建语料库,然后采用基于信息熵的方法对每个类别分别提取热词候选词,最后采用基于时间变化的方法进行候选词热度计算,根据候选词热度排序结果得到热词。本文抽取农业网站上的15 354条文本进行实验,结果表明,热词提取准确率达到0.9以上,能够较高质量地提取农业热词,为不同农业用户群体发现和分析产业热点提供帮助。

关 键 词:农业网络信息  农业舆情监测  热词  多标记分类  热度计算
收稿时间:2017/12/15 0:00:00

Automatic Extraction Method of Hot Words Based on Agricultural Network Information Classification
DUAN Qingling,ZHANG Lu,LIU Yiran and WANG Shasha.Automatic Extraction Method of Hot Words Based on Agricultural Network Information Classification[J].Transactions of the Chinese Society of Agricultural Machinery,2018,49(7):160-167.
Authors:DUAN Qingling  ZHANG Lu  LIU Yiran and WANG Shasha
Institution:China Agricultural University,China Agricultural University,China Agricultural University and Agricultural Information Technology Limited Liability Company of Beijing
Abstract:With the vigorous development of the Internet, the network information grows rapidly, so does the agricultural network information. Extracting hot words from massive information is of great significance for monitoring and analyzing agricultural public opinion. Up to now, there is some research on hot words extraction, but there are still many problems such as poor pertinence. Existing hot word extraction methods cannot meet the personalized needs of users in different industries in agriculture. Therefore, a method of automatically extracting hot words based on agricultural network information classification was proposed. Firstly, the texts were classified by using the multi-label classification algorithm and multiple corpuses were built according to the classification categories. Secondly, the hot word candidates for each category were extracted by using the method based on information entropy. Thirdly, the heat of each hot word candidate was calculated by using the method based on time variation. Finally, these candidates were sorted by heat degree, and hot words were got according to the sorting results. Totally 15354 texts from agricultural websites were extracted for the experiment, automatically obtaining the hot words in the specified time period. The experiment results showed that the accuracy was over 0.9. It proved that the proposed method can extract agricultural hot words with high quality and help different agricultural user groups find and analyze the hot spot information of the industry.
Keywords:agricultural network information  agricultural public opinion monitoring  hot word  multi-label classification  heat calculation
本文献已被 CNKI 等数据库收录!
点击此处可从《农业机械学报》浏览原始摘要信息
点击此处可从《农业机械学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号