首页 | 本学科首页   官方微博 | 高级检索  
     

农业网站导航页面识别模型研究
引用本文:王霜霜,张太红,冯向萍,陈燕红,马健. 农业网站导航页面识别模型研究[J]. 新疆农业大学学报, 2011, 0(5): 447-453
作者姓名:王霜霜  张太红  冯向萍  陈燕红  马健
作者单位:新疆农业大学计算机与信息工程学院;
基金项目:新疆维吾尔自治区科技攻关项目(200931103)
摘    要:针对农业网站中大量存在的不含实际信息的导航页面,提出了将网页文本特征与非文本特征综合考虑来构建农业网站导航页面识别模型的方法。对农业网站导航网页两类特征:文本特征与非文本特征,利用HTML-Parser网页解析器、庖丁解牛分词器、卡方检验算法,结合最小二乘多元线性回归方法,进行了实验分析对比。经过对5 000张训练样本与1 400张测试样本的网页实测表明,将农业导航页面文本特征与非文本特征集结合构建的分类器,对农业导航网页有很好的识别效果.当特征词数目达到200以上,准确率可达94%左右且趋于稳定。

关 键 词:导航页面  网页识别  特征选择  多元线性回归

Research on the Recognition Model for Navigation Pages of Agricultural Websites
WANG Shuang-shuang,ZHANG Tai-hong,FENG Xiang-ping,CHEN Yan-hong,MA Jian. Research on the Recognition Model for Navigation Pages of Agricultural Websites[J]. Journal of Xinjiang Agricultural University, 2011, 0(5): 447-453
Authors:WANG Shuang-shuang  ZHANG Tai-hong  FENG Xiang-ping  CHEN Yan-hong  MA Jian
Affiliation:WANG Shuang-shuang,ZHANG Tai-hong,FENG Xiang-ping,CHEN Yan-hong,MA Jian(College of Computer & Information Engineering,Xinjiang Agricultural University,Urumqi 830052,China)
Abstract:In this study a method for recognizing the model was discussed by summarizing the feature of web page text and the feature of non-text to construct the navigation pages of agricultural websites based on a great number of navigation pages in which there were no practical information in agricultural websites.There were two types of features of navigation pages of agricultural websites including text feature and non-text feature.The experiment and analysis contrast were carried out by use of HTML Parser web pa...
Keywords:navigation pages  recognition of web pages  character selection  multivariate linear regression  
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号