首页 | 本学科首页   官方微博 | 高级检索  
     

基于两点机器学习方法的土壤有机质空间分布预测
引用本文:王雨雪, 杨柯, 高秉博, 冯爱萍, 田娟, 姜传亮, 杨建宇. 基于两点机器学习方法的土壤有机质空间分布预测[J]. 农业工程学报, 2022, 38(12): 65-73. DOI: 10.11975/j.issn.1002-6819.2022.12.008
作者姓名:王雨雪  杨柯  高秉博  冯爱萍  田娟  姜传亮  杨建宇
作者单位:1.中国农业大学土地科学与技术学院,北京 100193;2.农业农村部农业灾害遥感重点实验室,北京 100083;3.中国地质调查局哈尔滨自然资源综合调查中心,哈尔滨 150080;4.中国地质科学院地球物理地球化学勘查研究所,廊坊 065000;5.生态环境部卫星环境应用中心,北京 100094
基金项目:国家重点研发计划项目(2021YFE0102300);松嫩平原海伦地区黑土地地表基质层调查项目(DD20211589)
摘    要:准确预测土壤有机质(Soil Organic Matter,SOM)空间分布对精细农业、耕地质量建设、生态环境保护以及固碳减排等均具有重要的意义。该研究探讨了基于两点机器学习方法(Two-point Machine Learning,TPML)提高SOM空间分布预测的可行性。以黑龙江省海伦市为研究区,以气候、地形地貌、社会经济和空间位置信息等因素作为辅助变量,充分利用空间位置信息和属性相似关系,有效处理SOM空间分布异质性及其与辅助变量间关系异质性,以提高TPML方法进行SOM空间分布预测的精度。采用随机森林、基于随机森林的回归克里格、反距离权重法和普通克里格(Ordinary Kriging,OK)方法作为对比,以平均绝对误差(Mean Absolute Error,MAE)、均方根误差(Root Mean Square Error,RMSE)、预测值与真实值相关系数(r)和决定系数(R2)作为评价指标,进行不同样本量下的多组对比试验,评价不同方法的预测精度。结果表明:1)研究区SOM含量在1.775~7.188 g/kg之间,平均值为3.179 g/kg,空间分布不均匀,呈东高西低的分布趋势。2)在不同样本量条件下,与其他模型相比,TPML的预测精度均最高,其MAE(0.088~0.097 g/kg)和RMSE(0.116~0.139 g/kg)均为最小,r(0.992~0.996)和R2(0.971~0.985)均为最高。3)预测值的误差标准差(理论误差)与实际误差具有相似的空间模式,说明TPML可以为预测结果提供合理的不确定性估计。综上,TPML模型可以通过同时利用空间自相关性和属性相似性来提高预测精度,该模型适用于预测具有一定空间自相关性且具有可用辅助数据的资源环境变量。

关 键 词:土壤  有机质  随机森林  空间分布预测  空间自相关性  属性相似性  两点机器学习
收稿时间:2022-04-06
修稿时间:2022-05-22

Prediction of the spatial distribution of soil organic matter based on two-point machine learning method
Wang Yuxue, Yang Ke, Gao Bingbo, Feng Aiping, Tian Juan, Jiang Chuanliang, Yang Jianyu. Prediction of the spatial distribution of soil organic matter based on two-point machine learning method[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(12): 65-73. DOI: 10.11975/j.issn.1002-6819.2022.12.008
Authors:Wang Yuxue  Yang Ke  Gao Bingbo  Feng Aiping  Tian Juan  Jiang Chuanliang  Yang Jianyu
Affiliation:1.College of Land Science and Technology, China Agricultural University, Beijing 100193, China;2.Key Laboratory of Remote Sensing of Agricultural Disasters, Ministry of Agriculture and Rural Affairs, Beijing 100083, China;3.Harbin Natural Resources Comprehensive Survey Center, China Geological Survey, Harbin 150080, China;4.Institute of Geophysical and Geochemical Exploration, Chinese Academy of Geological Sciences, Langfang 065000, China;5.Ministry of Ecology and Environment Center for Satellite Application on Ecology and Environment, Beijing 100094, China
Abstract:Abstract: An accurate prediction of the spatial distribution of Soil Organic Matter (SOM) is of great importance for precision agriculture, farmland quality construction, ecological environment protection, and soil carbon sequestration. However, the accuracy of prediction dominates by the heterogeneity of SOM spatial distribution and its relationship with auxiliary variables. Taking Hailun City, Heilongjiang Province (126°14′-127°45′ E, 48°58′-47°52′ N) of northeast China as the study area, this study aims to accurately and rapidly predict the SOM spatial distribution using a Two-Point Machine Learning Method (TPML) with the climate, topography, socio-economic, and spatial location as the auxiliary variables. The spatial location and auxiliary variables were also integrated to effectively deal with the heterogeneity of SOM spatial distribution and the heterogeneity of its relationship with auxiliary variables. The performance of TPML was then evaluated using the Random Forest (RF), RF regression kriging, inverse distance weighting, and Ordinary Kriging (OK) models. The performances of the models with samples of different sizes were also evaluated using the Mean Absolute Error (MAE), Root Mean Square Error (RMSE), correlation coefficient between the predict and true value (r), and the coefficient of determination (R2). The results reveal that: 1) The SOM was predicted to range from 1.775 to 7.188 g/kg in the study area, with an average value of 3.179 g/kg. The spatial distribution of SOM spatially varied, with a trend of the high in the east and the low in the west. Meanwhile, the SOM content was positively correlated with the normalized difference vegetation index (NDVI), digital elevation, and mean annual precipitation, whereas, negatively correlated with the gross domestic product, mean annual air temperature, and topographic wetness index, particularly significantly related to the land use, landform, vegetation, and soil type. 2) The TPML presented the highest accuracy of prediction under different sample sizes, with the lowest MAE (0.088-0.097 g/kg) and RMSE (0.116-0.139 g/kg), while the highest r (0.992-0.996) and R2 (0.971-0.985). The MAE and RMSE of the TPML model were improved much more than 0.7 g/kg, while the r and R2 were improved by more than 0.2, and 0.9, respectively, compared with the most frequently-used OK. 3) There is a similar spatial pattern between the standard deviation of prediction errors (theoretical errors) and the actual errors, indicating that the TPML provided reasonable uncertainty estimates for the prediction. Consequently, the TPML can be expected to employ spatial autocorrelation and attribute similarity at the same time for higher spatial prediction accuracy. Anyway, the TPML spatial prediction of variables is feasible for the resource and environment with a certain degree of spatial autocorrelation and available auxiliary data.
Keywords:soils   organic matter   random forest   spatial distribution prediction   spatial auto-correlation   attribute similarity   two-point machine learning
点击此处可从《农业工程学报》浏览原始摘要信息
点击此处可从《农业工程学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号