首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于农业垂直搜索引擎中文分词词典的构建研究
引用本文:张启宇,于辉辉,陈英义,等.基于农业垂直搜索引擎中文分词词典的构建研究[J].广东农业科学,2015,42(3):165-169.
作者姓名:张启宇  于辉辉  陈英义  
作者单位:1. 中国农业大学烟台研究院,山东烟台,264670
2. 中国农业大学信息与电气工程学院,北京,100083
3. 中国农业大学信息与电气工程学院,北京100083;农业部农业信息获取技术重点实验室,北京 100083
4. 山东省农业科学院科技信息研究所,山东济南,250100
摘    要:在农业垂直搜索引擎研究过程中,中文分词是重要的研究方向。针对传统农业垂直搜索引擎搜索信息抽取不准确、速度慢等缺点,采用双数组Trie树为基本模型,利用中文词条首字区位码与数据库表行号相对应的方式,并根据农业垂直搜索引擎的需要设置了农业词汇的词性编码,以My SQL数据库为例设计了农业领域专用的分词词典。该分词词典可充分利用数据库的优势进行词典组织,并且可以进行词库的远程共享和共同维护,方便不同的系统进行访问;词条按首字分类存放构造双数组Trie树,可有效减少构造过程的内存空间。该农业分词词典结构对其他领域和行业也具有借鉴意义。

关 键 词:中文分词  农业词典  My  SQL  词性编码

Construction of Chinese word segmentation dictionary based on agricultural vertical search engine
ZHANG Qi-yu,YU Hui-hui,CHEN Ying-yi,WANG Lei.Construction of Chinese word segmentation dictionary based on agricultural vertical search engine[J].Guangdong Agricultural Sciences,2015,42(3):165-169.
Authors:ZHANG Qi-yu  YU Hui-hui  CHEN Ying-yi  WANG Lei
Institution:ZHANG Qi-yu;YU Hui-hui;CHEN Ying-yi;WANG Lei;Yantai Academy,China Agricultural University;College of Information and Electrical Engineering,China Agricultural University;Key Laboratory of Agricultural Information Acquisition Technology,Ministry of Agriculture;Institute of Information Technology,Shandong Academy of Agricultural Sciences;
Abstract:In the process of agricultural vertical search engine research, Chinese word segmentation is an important research direction. Vertical search engines existed inaccuracy, slow velocity and other shortcomings for information extraction based on traditional agricultural. In this paper, the Trie tree method was adopted as the basic model to design the word segmentation dictionary specifically for agricultural use based on MySQL database. The word segmentation dictionary could make full use of the database for dictionary. It could be a thesaurus remote sharing and common maintenance, convenient access to different system. In the dictionary, used the term in Chinese location code and database table row number corresponding to the acronym, and according to the needs of agricultural vertical search engine, set up agricultural word part of speech coding. This dictionary stored the double array Trie tree according to the classification of storage structure. It could reduce the memory space of construction effectively. At the same time, the agricultural word segmentation dictionary structure also had reference significance to other field.
Keywords:Chinese word segmentation agricultural dictionary My SQL part of speech coding
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《广东农业科学》浏览原始摘要信息
点击此处可从《广东农业科学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号