Yingqiang Song , Feng Wang , Weihao Yang , Ruilin Liang , Dexi Zhan , Meiyan Xiang , Xiaohang Yang , Rui Xu , Miao Lu
{"title":"基于自动超参数优化方法的黄河三角洲土壤有机碳高效预测","authors":"Yingqiang Song , Feng Wang , Weihao Yang , Ruilin Liang , Dexi Zhan , Meiyan Xiang , Xiaohang Yang , Rui Xu , Miao Lu","doi":"10.1016/j.compag.2025.110490","DOIUrl":null,"url":null,"abstract":"<div><div>Using machine learning (ML) and deep learning (DL) models to predict the spatial variability of soil organic carbon (SOC) is crucial for advancing carbon emission reduction strategies. However, inadequate hyperparameter tuning remains a key limitation, reducing the model fitting performance and prediction accuracy. Notably, high-performance models enabled by automatic hyperparameter optimization (AHPO) represent a novel approach to explain the complex relationships between environmental factors and SOC. In this study, we analyzed the prediction performance of ML models, such as gradient boosting decision tree (GBDT) and extreme gradient boosting (XGB), and DL models, including deep forest (DF) and convolutional neural network (CNN). These models were optimized using nature-inspired algorithms (grey wolf optimization (GWO) and hunter-prey optimization (HPO)) and mathematical-approximation algorithms (Bayesian optimization (BO) and tree-structured Parzen estimator (TPE). Furthermore, we derived the linear and nonlinear driving effects of environmental factors (soil, vegetation, texture, climate, and terrain) on SOC. We also identified direct and indirect response pathways using SHapley additive interpretation (SHAP), variogram decomposition (VD), hierarchical partitioning (HP), and structural equation model (SEM). Our results show that prediction models optimized with mathematical approximation algorithms, such as BO-DF (R<sup>2</sup> = 0.76) and TPE-DF (R<sup>2</sup> = 0.82), demonstrated the strongest nonlinear fitting ability between environmental factors and SOC. AHPO algorithms significantly improved the prediction performance of DL models, with R<sup>2</sup> values for the four optimization methods increasing from 0.72 to 0.82. The generalization verification results indicate that the TPE-optimized model demonstrates strong robustness and achieves the highest accuracy (R<sup>2</sup> > 0.7) for SOC prediction. The AHPO prediction model’s hyperparameter combination achieves a balance between similarity and distinctiveness, where key performance-determining hyperparameters exhibit significant variation (i.e. non-similarity), enabling high-performance SOC predictions. The spatial mapping using the TPE-DF model revealed that areas with high SOC content are primarily concentrated in the southern and northeastern regions of the study area. Moreover, when the model’s prediction accuracy (R<sup>2</sup>) exceeds 0.75, SHAP analysis identifies SoilAN, SoilAP, SoilAK, TMP, and PRE as the most influential environmental factors driving nonlinear changes in SOC. Similarly, VD and HP analyses highlight a synergistic linear contribution of soil and climate factors, accounting for 99.1 % of the variability in SOC. Interestingly, the path analysis further indicates that regional climate warming leads to surface soil desiccation and salinization, which significantly alters the SOC decomposition environment. High salt stress negatively affects microorganisms and crop root activity, ultimately enhancing SOC accumulation in surface soil. Overall, AHPO-empowered ML and DL methods exhibit strong feasibility for analyzing the response relationship between environmental factors and SOC. Therefore, these methods provide robust support for high-performance and high-precision SOC monitoring across spatial scales.</div></div>","PeriodicalId":50627,"journal":{"name":"Computers and Electronics in Agriculture","volume":"236 ","pages":"Article 110490"},"PeriodicalIF":7.7000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High-performance prediction of soil organic carbon using automatic hyperparameter optimization method in the yellow river delta of China\",\"authors\":\"Yingqiang Song , Feng Wang , Weihao Yang , Ruilin Liang , Dexi Zhan , Meiyan Xiang , Xiaohang Yang , Rui Xu , Miao Lu\",\"doi\":\"10.1016/j.compag.2025.110490\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Using machine learning (ML) and deep learning (DL) models to predict the spatial variability of soil organic carbon (SOC) is crucial for advancing carbon emission reduction strategies. However, inadequate hyperparameter tuning remains a key limitation, reducing the model fitting performance and prediction accuracy. Notably, high-performance models enabled by automatic hyperparameter optimization (AHPO) represent a novel approach to explain the complex relationships between environmental factors and SOC. In this study, we analyzed the prediction performance of ML models, such as gradient boosting decision tree (GBDT) and extreme gradient boosting (XGB), and DL models, including deep forest (DF) and convolutional neural network (CNN). These models were optimized using nature-inspired algorithms (grey wolf optimization (GWO) and hunter-prey optimization (HPO)) and mathematical-approximation algorithms (Bayesian optimization (BO) and tree-structured Parzen estimator (TPE). Furthermore, we derived the linear and nonlinear driving effects of environmental factors (soil, vegetation, texture, climate, and terrain) on SOC. We also identified direct and indirect response pathways using SHapley additive interpretation (SHAP), variogram decomposition (VD), hierarchical partitioning (HP), and structural equation model (SEM). Our results show that prediction models optimized with mathematical approximation algorithms, such as BO-DF (R<sup>2</sup> = 0.76) and TPE-DF (R<sup>2</sup> = 0.82), demonstrated the strongest nonlinear fitting ability between environmental factors and SOC. AHPO algorithms significantly improved the prediction performance of DL models, with R<sup>2</sup> values for the four optimization methods increasing from 0.72 to 0.82. The generalization verification results indicate that the TPE-optimized model demonstrates strong robustness and achieves the highest accuracy (R<sup>2</sup> > 0.7) for SOC prediction. The AHPO prediction model’s hyperparameter combination achieves a balance between similarity and distinctiveness, where key performance-determining hyperparameters exhibit significant variation (i.e. non-similarity), enabling high-performance SOC predictions. The spatial mapping using the TPE-DF model revealed that areas with high SOC content are primarily concentrated in the southern and northeastern regions of the study area. Moreover, when the model’s prediction accuracy (R<sup>2</sup>) exceeds 0.75, SHAP analysis identifies SoilAN, SoilAP, SoilAK, TMP, and PRE as the most influential environmental factors driving nonlinear changes in SOC. Similarly, VD and HP analyses highlight a synergistic linear contribution of soil and climate factors, accounting for 99.1 % of the variability in SOC. Interestingly, the path analysis further indicates that regional climate warming leads to surface soil desiccation and salinization, which significantly alters the SOC decomposition environment. High salt stress negatively affects microorganisms and crop root activity, ultimately enhancing SOC accumulation in surface soil. Overall, AHPO-empowered ML and DL methods exhibit strong feasibility for analyzing the response relationship between environmental factors and SOC. Therefore, these methods provide robust support for high-performance and high-precision SOC monitoring across spatial scales.</div></div>\",\"PeriodicalId\":50627,\"journal\":{\"name\":\"Computers and Electronics in Agriculture\",\"volume\":\"236 \",\"pages\":\"Article 110490\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2025-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers and Electronics in Agriculture\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0168169925005964\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Electronics in Agriculture","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168169925005964","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}
High-performance prediction of soil organic carbon using automatic hyperparameter optimization method in the yellow river delta of China
Using machine learning (ML) and deep learning (DL) models to predict the spatial variability of soil organic carbon (SOC) is crucial for advancing carbon emission reduction strategies. However, inadequate hyperparameter tuning remains a key limitation, reducing the model fitting performance and prediction accuracy. Notably, high-performance models enabled by automatic hyperparameter optimization (AHPO) represent a novel approach to explain the complex relationships between environmental factors and SOC. In this study, we analyzed the prediction performance of ML models, such as gradient boosting decision tree (GBDT) and extreme gradient boosting (XGB), and DL models, including deep forest (DF) and convolutional neural network (CNN). These models were optimized using nature-inspired algorithms (grey wolf optimization (GWO) and hunter-prey optimization (HPO)) and mathematical-approximation algorithms (Bayesian optimization (BO) and tree-structured Parzen estimator (TPE). Furthermore, we derived the linear and nonlinear driving effects of environmental factors (soil, vegetation, texture, climate, and terrain) on SOC. We also identified direct and indirect response pathways using SHapley additive interpretation (SHAP), variogram decomposition (VD), hierarchical partitioning (HP), and structural equation model (SEM). Our results show that prediction models optimized with mathematical approximation algorithms, such as BO-DF (R2 = 0.76) and TPE-DF (R2 = 0.82), demonstrated the strongest nonlinear fitting ability between environmental factors and SOC. AHPO algorithms significantly improved the prediction performance of DL models, with R2 values for the four optimization methods increasing from 0.72 to 0.82. The generalization verification results indicate that the TPE-optimized model demonstrates strong robustness and achieves the highest accuracy (R2 > 0.7) for SOC prediction. The AHPO prediction model’s hyperparameter combination achieves a balance between similarity and distinctiveness, where key performance-determining hyperparameters exhibit significant variation (i.e. non-similarity), enabling high-performance SOC predictions. The spatial mapping using the TPE-DF model revealed that areas with high SOC content are primarily concentrated in the southern and northeastern regions of the study area. Moreover, when the model’s prediction accuracy (R2) exceeds 0.75, SHAP analysis identifies SoilAN, SoilAP, SoilAK, TMP, and PRE as the most influential environmental factors driving nonlinear changes in SOC. Similarly, VD and HP analyses highlight a synergistic linear contribution of soil and climate factors, accounting for 99.1 % of the variability in SOC. Interestingly, the path analysis further indicates that regional climate warming leads to surface soil desiccation and salinization, which significantly alters the SOC decomposition environment. High salt stress negatively affects microorganisms and crop root activity, ultimately enhancing SOC accumulation in surface soil. Overall, AHPO-empowered ML and DL methods exhibit strong feasibility for analyzing the response relationship between environmental factors and SOC. Therefore, these methods provide robust support for high-performance and high-precision SOC monitoring across spatial scales.
期刊介绍:
Computers and Electronics in Agriculture provides international coverage of advancements in computer hardware, software, electronic instrumentation, and control systems applied to agricultural challenges. Encompassing agronomy, horticulture, forestry, aquaculture, and animal farming, the journal publishes original papers, reviews, and applications notes. It explores the use of computers and electronics in plant or animal agricultural production, covering topics like agricultural soils, water, pests, controlled environments, and waste. The scope extends to on-farm post-harvest operations and relevant technologies, including artificial intelligence, sensors, machine vision, robotics, networking, and simulation modeling. Its companion journal, Smart Agricultural Technology, continues the focus on smart applications in production agriculture.