首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于随机森林特征优选的雪茄烟叶晾制过程含水率预测
引用本文:邢卓冉,张凯,刘旭东,马明,刘冰洋,丁松爽,时雅琪,安继鹏,高浩杰,时向东.基于随机森林特征优选的雪茄烟叶晾制过程含水率预测[J].农业工程学报,2024,40(7):343-354.
作者姓名:邢卓冉  张凯  刘旭东  马明  刘冰洋  丁松爽  时雅琪  安继鹏  高浩杰  时向东
作者单位:河南农业大学烟草学院/国家烟草栽培生理生化研究基地/烟草行业烟草栽培重点实验室, 郑州 450002;湖南省烟草公司怀化市公司, 怀化 418000;中国烟草总公司郑州烟草研究院, 郑州 450001;安徽中烟工业责任有限公司雪茄研究所, 合肥 230088
基金项目:国家自然科学基金项目(32101851);河南省重点研发与推广专项(222102110163);湖南雪茄烟叶晾制技术研究与应用(HN2022KJ03)
摘    要:针对雪茄烟叶晾制过程含水率人工判断主观性强、准确度低等不足,以及对影响雪茄烟叶晾制过程含水率预测的重要表观特征尚不明确等问题,该研究基于图像特征提取以及机器学习技术实现雪茄烟叶晾制过程含水率的预测。试验以雪茄烟品种“云雪2号”为试验材料,获取晾制过程的烟叶图像的颜色、轮廓、纹理以及部位四类特征并筛选出雪茄烟叶含水率预测的优选图像特征子集。在此基础上,构建了随机森林(random forest, RF)、支持向量机(support vector regression, SVR)与反向传播神经网络(back propagation neural network, BPNN)模型,并利用遗传算法(genetic algorithm, GA)对各模型超参数进行优化,将原始图像特征集与优选图像特征集输入3个机器学习模型,构建出6种模型-特征组合方案,依据晾制时期对原始数据集进行划分,并对测试集进行预测。最终结果显示:GA-SVR模型+优选图像特征子集的组合方案在测试集上表现最优,其决定系数(coefficient of determination,r2)与均方误差(mean square error,MSE)分别为0.980和0.001,且运行时间最短(运行时长=0.128 s)。研究结果可为雪茄烟叶晾制过程智能化控制提供理论依据。

关 键 词:图像处理  含水率  随机森林  特征优选  雪茄烟叶
收稿时间:2023/11/10 0:00:00
修稿时间:2023/12/22 0:00:00

Prediction of moisture content in cigar tobacco leaves during the drying process based on random forest feature selection
XING Zhuoran,ZHANG Kai,LIU Xudong,MA Ming,LIU Bingyang,DING Songshuang,SHI Yaqi,AN Jipeng,GAO Haojie,SHI Xiangdong.Prediction of moisture content in cigar tobacco leaves during the drying process based on random forest feature selection[J].Transactions of the Chinese Society of Agricultural Engineering,2024,40(7):343-354.
Authors:XING Zhuoran  ZHANG Kai  LIU Xudong  MA Ming  LIU Bingyang  DING Songshuang  SHI Yaqi  AN Jipeng  GAO Haojie  SHI Xiangdong
Abstract:Airing process has been one of the most important stages in the production of cigar leaves. Also, the appearance quality can be enhanced to indicate the intrinsic quality. The temperature and humidity can be adjusted inside the drying chamber in real time, according to the moisture content of the leaves for the proper browning. However, the leaf moisture content is often determined by the manual experiences at present, resulting in subjectivity and low accuracy. Alternatively, computer vision can be expected to assess the quality of agricultural products in recent years, due to its simplicity and high flexibility. Additionally, the random forest (RF) model can serve as the bagging-based ensemble machine learning. The high-dimensional data variables can be efficiently handled with high precision, training and prediction speeds. In this study, the prediction models were established for the moisture content of cigar leaves using RF machine learning. "Yunxue-2" variety of cigar tobacco was taken as the research object. Initially, the images of cigar leaves were collected during the airing process. The crucial apparent feature was extracted to determine the moisture content of cigar leaves. The color threshold and OTSU segmentation were combined to obtain the leaf region of interest (ROI). Subsequently, four-dimensional features were extracted, including color, contour, texture, and location. The correlation coefficient analysis was employed to eliminate the highly correlated features within each feature dimension, in order to prevent "dimension explosion." Then, the out-of-bag (OOB) data was used to determine the average decrease in the coefficient of determination (Decr2). The importance of image features was ranked as well. A comparison was conducted on the prediction accuracy and runtime of the RF model under different feature quantities. The optimal subset of image features was selected as the seven image features that are closely related to the moisture content of cigar tobacco leaf. The original and optimal feature subsets were then used to evaluate the RF, support vector regression (SVR), and back propagation neural network (BPNN) models. Genetic algorithm (GA) was utilized to optimize the hyperparameters of each model. Three models were combined with the two sets of image features. Six model-feature combination schemes were then established. Five-fold cross validation was employed to compare the prediction accuracy and generalization. Subsequently, the performance of six schemes was verified on a test dataset during drying. The results demonstrated that the combination of color, contour, texture, and location features of cigar tobacco leaf images effectively characterized the changes in the appearance morphology under moisture loss. The combination of SVR and BPNN with the optimal image feature subset outperformed their combinations with the original one after five-fold cross-validation. While RF exhibited better performance on the original image feature set, leading to avoiding the information redundancy with high-dimensional data. The best performance on the test set was achieved in the combination of the GA-SVR model and optimal image feature subset, with r2 and MSE values of 0.980 and 0.001, respectively, with the shortest runtime (0.128 s). In summary, the image features of cigar tobacco leaf were utilized to accurately predict the moisture content of different parts in the entire drying. The finding can also provide the theoretical basis for the intelligent drying of cigar tobacco leaves.
Keywords:image processing  moisture content  random forest  feature selection  cigar leaf
点击此处可从《农业工程学报》浏览原始摘要信息
点击此处可从《农业工程学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号