首页 | 本学科首页   官方微博 | 高级检索  
     

基于蒙特卡罗特征投影法的小麦蛋白质近红外光谱测量变量选择
引用本文:宦克为,刘小溪,郑 峰,蔡小龙,于素平,石晓光. 基于蒙特卡罗特征投影法的小麦蛋白质近红外光谱测量变量选择[J]. 农业工程学报, 2013, 29(4): 266-271
作者姓名:宦克为  刘小溪  郑 峰  蔡小龙  于素平  石晓光
作者单位:1. 长春理工大学理学院,长春 130022;2. 吉林省科学技术信息研究所,长春 130021;1. 长春理工大学理学院,长春 130022;1. 长春理工大学理学院,长春 130022;3. 北京东方孚德技术发展中心,北京 100037;1. 长春理工大学理学院,长春 130022
基金项目:国家科技攻关课题(2007BAI07A00-1);2011年高等学校博士学科点专项科研基金联合资助项目(20112216110006);吉林省自然科学基金(201215144);长春市科技支撑计划项目(11KZ05)资助课题。
摘    要:
为了实现小麦蛋白质的无损检测,简化便携式小麦蛋白质检测设备的预测模型,提高模型预测精度,该文针对小麦采集波长范围为950~1690nm的近红外漫透反射光谱,结合蒙特卡罗采样(MCS,monte carlosampling)技术与特征投影图(LPG,latent projective graph)方法对波长变量进行选择。根据模型集群分析(MPA,model population analysis)思想,采用MCS技术建立样本子空间,利用主成分分析(PCA,principal componentanalysis)得到LPG,假定LPG中共线性光谱变量对建模作用相同,选出少数波长变量建立子预测模型,选出预测均方根误差(RMSEP,root-mean-square error of prediction)较小的子模型,统计分析其变量的出现频次,选取频次最高的波长变量作为影响变量(IVs,influential variables)。研究结果表明,利用IVs建模可以将RMSEP值由0.5245减小到0.2548,采用蒙特卡罗采样技术的特征投影图方法(MC-LPG,monte carlo-latent projective graph)进行变量选择,对于提高模型预测精度是可行的。

关 键 词:近红外光谱  无损检测  模型  变量选择  蒙特卡罗采样  特征投影图
收稿时间:2012-09-16
修稿时间:2013-01-18

Variable selection of near-infrared spectroscopy for measuring wheat protein based on MC-LPG
Huan Kewei,Liu Xiaoxi,Zheng Feng,Cai Xiaolong,Yu Suping and Shi Xiaoguang. Variable selection of near-infrared spectroscopy for measuring wheat protein based on MC-LPG[J]. Transactions of the Chinese Society of Agricultural Engineering, 2013, 29(4): 266-271
Authors:Huan Kewei  Liu Xiaoxi  Zheng Feng  Cai Xiaolong  Yu Suping  Shi Xiaoguang
Affiliation:1(1.Collegeof Science,ChangChun University ofScienceandTechnology,ChangChun 130022,China;2.Instituteof Scientificand Technical Information in Jilin Province,ChangChun 130021,China;3.Beijing Oriental Info-Technology Development Center,Beijing 100037,China)
Abstract:
In order to realize the nondestructive determination of protein content in wheat, simplify the prediction model of portable wheat protein detection devices, and improve prediction accuracy of models, the near infrared diffuse transmission-reflectance spectra of wheat was measured from 950 to 1690 nm. The wavelength variable was selected by a combined Monte Carlo Sampling (MCS) technology and the Latent Projective Graph (LPG) method. The LPG is another expression of the principal component projective graph, and it is a technique developed in Chemical Factor Analysis (CFA) for investigating the nature of hyphenated data. Latent variables (loading) of a data matrix and the projection of objects onto the latent variables (score) are obtained by Principal Component Analysis (PCA), the nature of the data matrix can be analyzed by the loading and score plots, because the latent variables are linear combination of measured variables and the projection defines uniquely the sample relations in the reduced variable space spanned by the latent variables. So the LPG is adopted in wavelength selection for Near-Infrared (NIR) spectral analysis, the loading matrix is used to state the relationship among different samples, and the score matrix is used to select the wavelength variables. Model Population Analysis (MPA) is first obtained from the sub-dataset by MCS, then some sub-models are built for each sub-dataset. Finally, a statistical analysis is made from the sample space, variable space, parametric space and model space about the parameters which contribute to sub-models building,. Therefore, according to MPA, 500 sub-datasets of samples were established by MCS technology. For each sub-dataset, the proportion of calibration and prediction is 2:1.There are 61 kinds of wheat as calibration and 32 kinds of wheat as prediction. The LPG was obtained by PCA, assuming that linear spectral variables in LPG have the same contribution for modeling, a small number of wavelength variables were selected for building 500 predictable sub-models, 458 sub-models which have the smaller root mean square error (RMSEP) that is smaller than 0.55 were selected. The frequency number of the selected variables which are in 458 sub-models was analyzed statistically, the 12 wavelength of highest frequency number were selected as the influential variables (IVs), they were 1060, 1094, 1403, 1494, 1511, 1521, 1545, 1551, 1607, 1612, 1620, and 1630 nm. The RMSEP of the prediction model is reduced from 0.5245 to 0.2548 and the RPD value is increased from 1.7496 to 3.3985 by the new model which was built by the IVs. Therefore, the variable selection with Monte Carlo Sampling technology and Latent Projective Graph method (MC-LPG) is feasible for improving the precision of prediction model.
Keywords:near infrared spectroscopy   nondestructive examination   models   variable selection   monte carlo sampling   latent projective graph
本文献已被 CNKI 等数据库收录!
点击此处可从《农业工程学报》浏览原始摘要信息
点击此处可从《农业工程学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号