基于图像显著性分析与卷积神经网络的茶园害虫定位与识别 Localization and recognition of pests in tea plantation based on image saliency analysis and convolutional neural network期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于图像显著性分析与卷积神经网络的茶园害虫定位与识别

引用本文：	杨国国,鲍一丹,刘子毅. 基于图像显著性分析与卷积神经网络的茶园害虫定位与识别[J]. 农业工程学报, 2017, 33(6): 156-162. DOI: 10.11975/j.issn.1002-6819.2017.06.020

作者姓名：	杨国国鲍一丹刘子毅

作者单位：	浙江大学生物系统工程与食品科学学院,杭州,310058

基金项目：	国家自然科学基金（31471417）；博士点基金项目（20130101110104）

摘要：	为实现在茶园环境中快速、准确地识别害虫目标,该文提出了一种基于卷积神经网络的深度学习模型来进行害虫定位和识别的方法。该文通过对整个图像进行颜色衰减加速运算,结合超像素区域之间的空间影响,计算各个超区域的显著性值,进而提供害虫目标的潜在区域,最终结合Grab Cut算法进行害虫目标的定位和分割。对于分割后的害虫目标,通过优化后的卷积神经网络进行表达和分类,并进一步对卷积神经网络的结构进行了约减。通过对23种茶园主要害虫的识别,试验结果表明,识别准确率在优化前后分别为0.915和0.881,优化后的模型内存需求和运行耗时分别降低至6 MB和0.7 ms,取得了较好的识别效果。
关键词：	像素算法识别害虫检测图像显著性分析深度学习卷积神经网络
收稿时间：	2016-09-19
修稿时间：	2016-02-20
Localization and recognition of pests in tea plantation based on image saliency analysis and convolutional neural network

Yang Guoguo,Bao Yidan and Liu Ziyi. Localization and recognition of pests in tea plantation based on image saliency analysis and convolutional neural network[J]. Transactions of the Chinese Society of Agricultural Engineering, 2017, 33(6): 156-162. DOI: 10.11975/j.issn.1002-6819.2017.06.020

Authors:	Yang Guoguo Bao Yidan Liu Ziyi

Affiliation:	College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China,College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China and College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China

Abstract:	Abstract: Tea is one of important cash crops in China. Computer vision plays an important role in pest detection. Automatic classification of insect species in field is more difficult than the generic object classification because of complex background in filed and high appearance similarity among insect species. In this paper, we proposed an insect recognition system on the basis of image saliency analysis and a deep learning model, i.e. convolutional neural network (CNN), which has a good robustness with avoiding the features selected by artificial means. In image saliency analysis, we segmented the original images into super-pixel regions firstly. Then we quantized each RGB (red, green, blue) color channel and made them have 10 different values, which reduced the number of colors to 1000, and sped up the process of the color contrast of the pest objects and the background at region level. Finally, we obtained the saliency value of each region by combining their color contrast and spatial distances. The saliency values of all regions in each image were used to construct a saliency map, which was offered as the initial area for GrabCut algorithm to define the segmentation result and localize the pest object. The images after localization were quantized to 256×256 dpi for CNN training and classifying. CNN was trained end to end, from raw pixels to ultimate categories, thereby alleviating the requirement to manually design a suitable feature extractor. Based on theoretical analysis and experimental evaluation, we optimized the critical structure parameters and training strategy of CNN to seek the best configuration. The overall architecture included a number of sensitive parameters and optimization strategies that could be changed. We determined the local receptive field size, number, and convolutional stride as 7×7 dpi, 64 and 4, respectively. Dropout ratio for the fully-connected layers was 0.7. The loss function Softmax was fit for the pest classification system. To further improve the practical utility of CNN, we focused on structural changes of the overall architecture that enabled a faster running with small effects on the performance. We analyzed the performance and the corresponding runtime of our model by reducing its depth (number of layers) and width (number of convolution kernel in each layer). Removing the fully-connected layers (FC6, FC7) made only a slight difference to the overall architecture. These layers contained almost 90% of the parameters and when they were removed, the memory consumption decreased to 29.8 MB. But, removing the intermediate convolutional layers (Conv2, Conv3, Conv4, Conv5) resulted in a dramatic decrease in both accuracy and runtime. This suggested that the intermediate convolutional layers (Conv2, Conv3, Conv4, Conv5) constituted the main part of the computational resource, and their depth was important for achieving good results. We then investigated the effects of adjusting the sizes of all convolutional layers, and the filters in each convolutional layer were reduced to 64 each time. Surprisingly, all architectures showed significant decreases in running time with relatively small effects on performance. Finally, we determined the convolution kernel numbers of Conv2-5: 64-192-192-64. On the test set of tea field images, the architecture before and after shrinking respectively achieved the average accuracy (AA) of 0.915 and 0.881, respectively, superior to previous methods for pest image recognition. Further, after optimization the running time reduced to 0.7 ms and the memory required was 6 MB.

Keywords:	pixels algorithms identification pest detection image saliency analysis deep learning convolutional neural network
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《农业工程学报》浏览原始摘要信息
	点击此处可从《农业工程学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏