首页 | 本学科首页   官方微博 | 高级检索  
     

基于深度卷积神经网络的水稻知识文本分类方法
引用本文:冯帅,许童羽,周云成,赵冬雪,金宁,王郝日钦. 基于深度卷积神经网络的水稻知识文本分类方法[J]. 农业机械学报, 2021, 52(3): 257-264
作者姓名:冯帅  许童羽  周云成  赵冬雪  金宁  王郝日钦
作者单位:沈阳农业大学信息与电气工程学院,沈阳110161;沈阳农业大学信息与电气工程学院,沈阳110161;沈阳农业大学辽宁省农业信息化工程技术中心,沈阳110161
基金项目:国家重点研发计划项目(2018YFD0300309)
摘    要:为解决文本特征提取不准确和因网络层次加深而导致模型分类性能变差等问题,提出基于深度卷积神经网络的水稻知识文本分类方法.针对水稻知识文本的特点,采用Word2Vec方法进行文本向量化处理,并与One-Hot、TF-IDF和Hashing方法进行对比分析,得出Word2Vec方法具有较高的分类精度,正确率为86.44%,能...

关 键 词:水稻知识文本  文本分类  深度卷积神经网络  向量化处理  特征提取  分类模型
收稿时间:2020-06-13

Rice Knowledge Text Classification Based on Deep Convolution Neural Network
FENG Shuai,XU Tongyu,ZHOU Yuncheng,ZHAO Dongxue,JIN Ning,WANG Haoriqin. Rice Knowledge Text Classification Based on Deep Convolution Neural Network[J]. Transactions of the Chinese Society for Agricultural Machinery, 2021, 52(3): 257-264
Authors:FENG Shuai  XU Tongyu  ZHOU Yuncheng  ZHAO Dongxue  JIN Ning  WANG Haoriqin
Affiliation:Shenyang Agricultural University
Abstract:The data of weeds, pests, diseases and cultivation management of rice extracted from agricultural text data is a typical text classification problem, which is fundamental to key text information extraction, text data mining and agricultural intelligent question and answer. The classification of Chinese texts, especially agricultural texts, is characterized by poor data redundancy, sparsity and normativity. While the deep learning technology can automatically extract the key features of the text, and the built model has strong adaptability and mobility. For that reason, in order to solve the problem of classification performance of the model deteriorates caused by inaccurate text feature extraction and deepened network hierarchy, a text classification method of rice knowledge oriented Q&A system was proposed. The Python of scrapy was adopted to obtain Chinese text data on rice pests, grass pests, cultivation and management, such as the experts online system of Hownet and the planting question and answer website, as training and test samples. Jieba segmentation method was applied to rice knowledge text for word segmentation to remove useless symbols and stop words in the text. Meanwhile, the results of Chinese segmentation were greatly influenced by the segmentation lexicon. In order to improve the precision of word segmentation of rice knowledge text and reduce the situation of misclassification, omission and misclassification, a rice related corpus was constructed on the basis of sogou agricultural corpus, which further expanded the basic Jieba word segmentation database and improved the identification degree of specialized words such as rice diseases, insect pests, grass and drugs, cultivation and management. At the same time, Word2Vec method was used to vectorize text data, and it was compared with One-Hot, TF-IDF and Hashing methods, and it was concluded that Word2Vec method can effectively solve the text vector typical problems such as sparsity and incomplete information. Based on the fundamental structure of ResNet, nine kinds of rice knowledge text classification models were constructed by means of the change and design of its residual module and network hierarchy. The test results indicated that a network with 4-layer residual module structure had good feature extraction accuracy, and the Top-1 accuracy was 99.79%. In the convolutional neural network, the pooling layer was used for the under-sampling operation, which would lose certain text phrase relative position characteristics in the pooling process, thus affecting the classification accuracy of the model, therefore, the optimized 4-layer residual module structure was taken as the basic structure, and the CapsNet was used to replace the pooling layer, and a rice knowledge text classification model, referred to as RIC-Net, was designed. Through comparative analysis of six text classification models, including FastText, BiLSTM, Atten-BiGRU, RCNN, DPCNN and TextCNN, it was concluded that the text classification model designed was able to precisely classify rice knowledge texts with different sample sizes and different levels of complexity, which enabled the accuracy rate, recall rate and F1 value of the model to be no less than 95.17%, 95.83% and 95.50%, respectively, and the accuracy rate was as high as 98.62%. The model can realize accurate and efficient classification of rice knowledge text, meeting practical application requirements.
Keywords:rice knowledge text  text classification  deep convolution neural network  vectorization  feature extraction  classification model
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《农业机械学报》浏览原始摘要信息
点击此处可从《农业机械学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号