首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于Transformer的强泛化苹果叶片病害识别模型
引用本文:徐艳蕾,孔朔琳,陈清源,高志远,李陈孝.基于Transformer的强泛化苹果叶片病害识别模型[J].农业工程学报,2022,38(16):198-206.
作者姓名:徐艳蕾  孔朔琳  陈清源  高志远  李陈孝
作者单位:吉林农业大学信息技术学院,长春 130118
基金项目:吉林省科技厅国际科技合作项目(20200801014GH);长春市科技局重点科技攻关项目(21ZGN28)
摘    要:模型泛化能力是病害识别模型多场景应用的关键,该研究针对不同环境下的苹果叶片病害数据,提出一种可以提取多类型特征的强泛化苹果叶片病害识别模型CaTNet。该模型采用双分支结构,首先设计了一种卷积神经网络分支,负责提取苹果叶片图像的局部特征,其次构建了具有挤压和扩充功能的视觉Transformer分支,该分支能够提取苹果叶片图像的全局特征,最后将两种特征进行融合,使Transformer分支可以学习局部特征,使卷积神经网络分支学习全局特征。与多种卷积神经网络模型和Transformer模型相比,该模型具有更好的泛化能力,仅需学习实验室环境叶片数据,即可在自然环境数据下达到80%的识别精度,相较卷积神经网络EfficientNetV2提升7.21个百分点,相较Transformer网络PVT提升26.63个百分点,能够有效提升对不同环境数据的识别精度,解决了深度学习模型训练成本高,泛化能力弱的问题。

关 键 词:图像识别    农业  卷积神经网络  苹果叶片病害  Transformer模型  强泛化性  特征融合
收稿时间:2022/4/18 0:00:00
修稿时间:2022/8/1 0:00:00

Model for identifying strong generalization apple leaf disease using Transformer
Xu Yanlei,Kong Shuolin,Chen Qingyuan,Gao Zhiyuan,Li Chenxiao.Model for identifying strong generalization apple leaf disease using Transformer[J].Transactions of the Chinese Society of Agricultural Engineering,2022,38(16):198-206.
Authors:Xu Yanlei  Kong Shuolin  Chen Qingyuan  Gao Zhiyuan  Li Chenxiao
Institution:College of Information Technology, Jilin Agricultural University, Changchun 130118, China
Abstract:Abstract: Apple diseases have pose a serious risk on the income of orchards in recent years. An accurate and rapid identification of apple diseases can be great benefit to better prevent and control diseases. Most effort has been made in the laboratory to train the identification model, due mainly to the limited condition for the deliberately infect apples in the real orchard. However, most models cannot fully meet the requirement of the disease detection in the large-scale production. In this study, a deep learning model (called CaTNet) was proposed to extract both the global and local information from the diseases of apple leaf. The image data of disease was collected from the apple orchards in the Jilin Province of China. A total of 16,464 images were obtained from the several publicly available datasets with the laboratory and natural environmental data collected from the field. Firstly, a model structure was constructed with both Transformer and convolutional neural network (CNN). Global and local information was extracted from the original images using the two branches. The strong generalization ability of the model was improved to learn a wider variety of features. Meanwhile, the global features were acquired to improve the resistance of the model to interference. Secondly, the Transformer block in the Transformer branch was optimized to make the structure simpler. In addition, a channel compression and expansion module was designed in the Transformer branch, in order to reduce the training cost of CaTNet for the less channel dimension of the input features. Afterwards, the multiple multilayer perceptrons were replaced by the grouped convolutional layers to further improve the computational speed of the model. Thirdly, the lightweight CNN branch was constructed with an inverse residual structure to fuse the point convolution of the expanded channels with the 3×3 convolution of the extracted information. The CNN branch was utilized to extract the local features of the image. As such, the model was more sensitive to the fine-grained features. Finally, the concat operation was implemented to fuse the different output of features from the two branches. After that, the CNN branch was selected to extract the local features from the global ones, whereas, the Transformer branch was extracted the global from the local. The multiple features to be cycled were also improved the generalization of the model. A comparison was made to clarify the effect of different down-sampling on the two-branch network. Specifically, an accuracy rate of 79.35%, 74.06% and 67.95% were obtained using pooling, 3×3 size convolution kernel, and 1×1 size convolution kernel for the down-sampling, respectively. The CaTNet model with two branches showed a computational speed of 0.108 2 s/Frame), which was faster than the various deep learning models, such as the EfficientNetV2 s (0.383 2 s/Frame) and PVT t (0.177 8 s/Frame). Consequently, the two-branch structure can be expected to accommodate more computation for the much higher computational speed. This finding can provide a design approach to build the deep learning models with the high generalization capability, particularly on the training with the high accuracy under only easily accessible data.
Keywords:image identification  agriculture  convolutional neural networks  apple leaf disease  transformer  strong generalization ability  feature fusion
点击此处可从《农业工程学报》浏览原始摘要信息
点击此处可从《农业工程学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号