基于自动超参数优化方法的黄河三角洲土壤有机碳高效预测

IF 7.7 1区 农林科学 Q1 AGRICULTURE, MULTIDISCIPLINARY
Yingqiang Song , Feng Wang , Weihao Yang , Ruilin Liang , Dexi Zhan , Meiyan Xiang , Xiaohang Yang , Rui Xu , Miao Lu
{"title":"基于自动超参数优化方法的黄河三角洲土壤有机碳高效预测","authors":"Yingqiang Song ,&nbsp;Feng Wang ,&nbsp;Weihao Yang ,&nbsp;Ruilin Liang ,&nbsp;Dexi Zhan ,&nbsp;Meiyan Xiang ,&nbsp;Xiaohang Yang ,&nbsp;Rui Xu ,&nbsp;Miao Lu","doi":"10.1016/j.compag.2025.110490","DOIUrl":null,"url":null,"abstract":"<div><div>Using machine learning (ML) and deep learning (DL) models to predict the spatial variability of soil organic carbon (SOC) is crucial for advancing carbon emission reduction strategies. However, inadequate hyperparameter tuning remains a key limitation, reducing the model fitting performance and prediction accuracy. Notably, high-performance models enabled by automatic hyperparameter optimization (AHPO) represent a novel approach to explain the complex relationships between environmental factors and SOC. In this study, we analyzed the prediction performance of ML models, such as gradient boosting decision tree (GBDT) and extreme gradient boosting (XGB), and DL models, including deep forest (DF) and convolutional neural network (CNN). These models were optimized using nature-inspired algorithms (grey wolf optimization (GWO) and hunter-prey optimization (HPO)) and mathematical-approximation algorithms (Bayesian optimization (BO) and tree-structured Parzen estimator (TPE). Furthermore, we derived the linear and nonlinear driving effects of environmental factors (soil, vegetation, texture, climate, and terrain) on SOC. We also identified direct and indirect response pathways using SHapley additive interpretation (SHAP), variogram decomposition (VD), hierarchical partitioning (HP), and structural equation model (SEM). Our results show that prediction models optimized with mathematical approximation algorithms, such as BO-DF (R<sup>2</sup> = 0.76) and TPE-DF (R<sup>2</sup> = 0.82), demonstrated the strongest nonlinear fitting ability between environmental factors and SOC. AHPO algorithms significantly improved the prediction performance of DL models, with R<sup>2</sup> values for the four optimization methods increasing from 0.72 to 0.82. The generalization verification results indicate that the TPE-optimized model demonstrates strong robustness and achieves the highest accuracy (R<sup>2</sup> &gt; 0.7) for SOC prediction. The AHPO prediction model’s hyperparameter combination achieves a balance between similarity and distinctiveness, where key performance-determining hyperparameters exhibit significant variation (i.e. non-similarity), enabling high-performance SOC predictions. The spatial mapping using the TPE-DF model revealed that areas with high SOC content are primarily concentrated in the southern and northeastern regions of the study area. Moreover, when the model’s prediction accuracy (R<sup>2</sup>) exceeds 0.75, SHAP analysis identifies SoilAN, SoilAP, SoilAK, TMP, and PRE as the most influential environmental factors driving nonlinear changes in SOC. Similarly, VD and HP analyses highlight a synergistic linear contribution of soil and climate factors, accounting for 99.1 % of the variability in SOC. Interestingly, the path analysis further indicates that regional climate warming leads to surface soil desiccation and salinization, which significantly alters the SOC decomposition environment. High salt stress negatively affects microorganisms and crop root activity, ultimately enhancing SOC accumulation in surface soil. Overall, AHPO-empowered ML and DL methods exhibit strong feasibility for analyzing the response relationship between environmental factors and SOC. Therefore, these methods provide robust support for high-performance and high-precision SOC monitoring across spatial scales.</div></div>","PeriodicalId":50627,"journal":{"name":"Computers and Electronics in Agriculture","volume":"236 ","pages":"Article 110490"},"PeriodicalIF":7.7000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High-performance prediction of soil organic carbon using automatic hyperparameter optimization method in the yellow river delta of China\",\"authors\":\"Yingqiang Song ,&nbsp;Feng Wang ,&nbsp;Weihao Yang ,&nbsp;Ruilin Liang ,&nbsp;Dexi Zhan ,&nbsp;Meiyan Xiang ,&nbsp;Xiaohang Yang ,&nbsp;Rui Xu ,&nbsp;Miao Lu\",\"doi\":\"10.1016/j.compag.2025.110490\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Using machine learning (ML) and deep learning (DL) models to predict the spatial variability of soil organic carbon (SOC) is crucial for advancing carbon emission reduction strategies. However, inadequate hyperparameter tuning remains a key limitation, reducing the model fitting performance and prediction accuracy. Notably, high-performance models enabled by automatic hyperparameter optimization (AHPO) represent a novel approach to explain the complex relationships between environmental factors and SOC. In this study, we analyzed the prediction performance of ML models, such as gradient boosting decision tree (GBDT) and extreme gradient boosting (XGB), and DL models, including deep forest (DF) and convolutional neural network (CNN). These models were optimized using nature-inspired algorithms (grey wolf optimization (GWO) and hunter-prey optimization (HPO)) and mathematical-approximation algorithms (Bayesian optimization (BO) and tree-structured Parzen estimator (TPE). Furthermore, we derived the linear and nonlinear driving effects of environmental factors (soil, vegetation, texture, climate, and terrain) on SOC. We also identified direct and indirect response pathways using SHapley additive interpretation (SHAP), variogram decomposition (VD), hierarchical partitioning (HP), and structural equation model (SEM). Our results show that prediction models optimized with mathematical approximation algorithms, such as BO-DF (R<sup>2</sup> = 0.76) and TPE-DF (R<sup>2</sup> = 0.82), demonstrated the strongest nonlinear fitting ability between environmental factors and SOC. AHPO algorithms significantly improved the prediction performance of DL models, with R<sup>2</sup> values for the four optimization methods increasing from 0.72 to 0.82. The generalization verification results indicate that the TPE-optimized model demonstrates strong robustness and achieves the highest accuracy (R<sup>2</sup> &gt; 0.7) for SOC prediction. The AHPO prediction model’s hyperparameter combination achieves a balance between similarity and distinctiveness, where key performance-determining hyperparameters exhibit significant variation (i.e. non-similarity), enabling high-performance SOC predictions. The spatial mapping using the TPE-DF model revealed that areas with high SOC content are primarily concentrated in the southern and northeastern regions of the study area. Moreover, when the model’s prediction accuracy (R<sup>2</sup>) exceeds 0.75, SHAP analysis identifies SoilAN, SoilAP, SoilAK, TMP, and PRE as the most influential environmental factors driving nonlinear changes in SOC. Similarly, VD and HP analyses highlight a synergistic linear contribution of soil and climate factors, accounting for 99.1 % of the variability in SOC. Interestingly, the path analysis further indicates that regional climate warming leads to surface soil desiccation and salinization, which significantly alters the SOC decomposition environment. High salt stress negatively affects microorganisms and crop root activity, ultimately enhancing SOC accumulation in surface soil. Overall, AHPO-empowered ML and DL methods exhibit strong feasibility for analyzing the response relationship between environmental factors and SOC. Therefore, these methods provide robust support for high-performance and high-precision SOC monitoring across spatial scales.</div></div>\",\"PeriodicalId\":50627,\"journal\":{\"name\":\"Computers and Electronics in Agriculture\",\"volume\":\"236 \",\"pages\":\"Article 110490\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2025-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers and Electronics in Agriculture\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0168169925005964\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Electronics in Agriculture","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168169925005964","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

利用机器学习(ML)和深度学习(DL)模型预测土壤有机碳(SOC)的空间变异性对于推进碳减排战略至关重要。然而,不充分的超参数调整仍然是一个关键的限制,降低了模型拟合性能和预测精度。值得注意的是,由自动超参数优化(AHPO)实现的高性能模型代表了一种解释环境因素与SOC之间复杂关系的新方法。在这项研究中,我们分析了ML模型的预测性能,如梯度增强决策树(GBDT)和极端梯度增强(XGB),深度学习模型,包括深度森林(DF)和卷积神经网络(CNN)。采用自然启发算法(灰狼优化算法(GWO)和猎-猎物优化算法(HPO))和数学逼近算法(贝叶斯优化算法(BO)和树结构Parzen估计器(TPE))对这些模型进行优化。此外,我们还推导了土壤、植被、质地、气候和地形等环境因子对土壤有机碳的线性和非线性驱动效应。我们还利用SHapley加性解释(SHAP)、变异函数分解(VD)、层次划分(HP)和结构方程模型(SEM)确定了直接和间接的响应途径。结果表明,采用数学近似算法优化的BO-DF (R2 = 0.76)和TPE-DF (R2 = 0.82)预测模型在环境因子与有机碳间的非线性拟合能力最强。AHPO算法显著提高了深度学习模型的预测性能,4种优化方法的R2值从0.72提高到0.82。泛化验证结果表明,tpe优化模型具有较强的鲁棒性和较高的精度(R2 >;0.7)用于SOC预测。AHPO预测模型的超参数组合实现了相似性和独特性之间的平衡,其中关键的性能决定超参数表现出显著的变化(即非相似性),从而实现高性能SOC预测。利用TPE-DF模型进行的空间制图显示,土壤有机碳含量高的区域主要集中在研究区的南部和东北部。此外,当模型的预测精度(R2)超过0.75时,SHAP分析发现SoilAN、SoilAP、SoilAK、TMP和PRE是影响土壤有机碳非线性变化的主要环境因子。同样,VD和HP分析强调土壤和气候因子的协同线性贡献,占土壤有机碳变异的99.1%。路径分析进一步表明,区域气候变暖导致表层土壤干燥和盐渍化,显著改变了土壤有机碳分解环境。高盐胁迫对微生物和作物根系活性产生不利影响,最终促进表层土壤有机碳积累。总体而言,ahpo支持的ML和DL方法在分析环境因素与SOC之间的响应关系方面具有很强的可行性。因此,这些方法为跨空间尺度的高性能、高精度SOC监测提供了强有力的支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

High-performance prediction of soil organic carbon using automatic hyperparameter optimization method in the yellow river delta of China

High-performance prediction of soil organic carbon using automatic hyperparameter optimization method in the yellow river delta of China
Using machine learning (ML) and deep learning (DL) models to predict the spatial variability of soil organic carbon (SOC) is crucial for advancing carbon emission reduction strategies. However, inadequate hyperparameter tuning remains a key limitation, reducing the model fitting performance and prediction accuracy. Notably, high-performance models enabled by automatic hyperparameter optimization (AHPO) represent a novel approach to explain the complex relationships between environmental factors and SOC. In this study, we analyzed the prediction performance of ML models, such as gradient boosting decision tree (GBDT) and extreme gradient boosting (XGB), and DL models, including deep forest (DF) and convolutional neural network (CNN). These models were optimized using nature-inspired algorithms (grey wolf optimization (GWO) and hunter-prey optimization (HPO)) and mathematical-approximation algorithms (Bayesian optimization (BO) and tree-structured Parzen estimator (TPE). Furthermore, we derived the linear and nonlinear driving effects of environmental factors (soil, vegetation, texture, climate, and terrain) on SOC. We also identified direct and indirect response pathways using SHapley additive interpretation (SHAP), variogram decomposition (VD), hierarchical partitioning (HP), and structural equation model (SEM). Our results show that prediction models optimized with mathematical approximation algorithms, such as BO-DF (R2 = 0.76) and TPE-DF (R2 = 0.82), demonstrated the strongest nonlinear fitting ability between environmental factors and SOC. AHPO algorithms significantly improved the prediction performance of DL models, with R2 values for the four optimization methods increasing from 0.72 to 0.82. The generalization verification results indicate that the TPE-optimized model demonstrates strong robustness and achieves the highest accuracy (R2 > 0.7) for SOC prediction. The AHPO prediction model’s hyperparameter combination achieves a balance between similarity and distinctiveness, where key performance-determining hyperparameters exhibit significant variation (i.e. non-similarity), enabling high-performance SOC predictions. The spatial mapping using the TPE-DF model revealed that areas with high SOC content are primarily concentrated in the southern and northeastern regions of the study area. Moreover, when the model’s prediction accuracy (R2) exceeds 0.75, SHAP analysis identifies SoilAN, SoilAP, SoilAK, TMP, and PRE as the most influential environmental factors driving nonlinear changes in SOC. Similarly, VD and HP analyses highlight a synergistic linear contribution of soil and climate factors, accounting for 99.1 % of the variability in SOC. Interestingly, the path analysis further indicates that regional climate warming leads to surface soil desiccation and salinization, which significantly alters the SOC decomposition environment. High salt stress negatively affects microorganisms and crop root activity, ultimately enhancing SOC accumulation in surface soil. Overall, AHPO-empowered ML and DL methods exhibit strong feasibility for analyzing the response relationship between environmental factors and SOC. Therefore, these methods provide robust support for high-performance and high-precision SOC monitoring across spatial scales.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computers and Electronics in Agriculture
Computers and Electronics in Agriculture 工程技术-计算机:跨学科应用
CiteScore
15.30
自引率
14.50%
发文量
800
审稿时长
62 days
期刊介绍: Computers and Electronics in Agriculture provides international coverage of advancements in computer hardware, software, electronic instrumentation, and control systems applied to agricultural challenges. Encompassing agronomy, horticulture, forestry, aquaculture, and animal farming, the journal publishes original papers, reviews, and applications notes. It explores the use of computers and electronics in plant or animal agricultural production, covering topics like agricultural soils, water, pests, controlled environments, and waste. The scope extends to on-farm post-harvest operations and relevant technologies, including artificial intelligence, sensors, machine vision, robotics, networking, and simulation modeling. Its companion journal, Smart Agricultural Technology, continues the focus on smart applications in production agriculture.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信