Soil organic carbon (SOC) prediction using super learner algorithm based on the remote sensing variables

Q2 Environmental Science

Environmental Challenges Pub Date : 2025-04-21 DOI:10.1016/j.envc.2025.101160

Yeonpyeong Jo , Palash Panja , Hanseup Kim , Milind Deo

{"title":"Soil organic carbon (SOC) prediction using super learner algorithm based on the remote sensing variables","authors":"Yeonpyeong Jo , Palash Panja , Hanseup Kim , Milind Deo","doi":"10.1016/j.envc.2025.101160","DOIUrl":null,"url":null,"abstract":"<div><div>The absorption of carbon into the soil and its accurate monitoring is crucial for crop production rates and for mitigating global warming through increased carbon sequestration. Soil organic carbon (SOC) predictions using machine learning techniques have been actively researched because of their ability to handle non-linear relationships and predict accurately with limited prior assumptions about underlying mechanisms. However, the selection of appropriate machine learning methods remains a subject of debate, since each study area has unique data patterns, leading to various prediction performance across different algorithm types. To address these challenges, superlearner algorithm was employed to predict SOC with data from four U.S. states: Arkansas, Idaho, Nebraska, and Utah. Remote sensing variables derived from Sentinel-2 and ALOS PALSAR were used as predictors, with feature selection applied. Results indicated that the linear regression-based superlearner achieved higher accuracy (nRMSE: 7.6 %, R²: 0.804) compared to the random forest-based model (nRMSE: 8.3 %, R²: 0.768), likely due to its ability to better capture the specific data patterns through careful base learner selection and hyperparameter optimization. In contrast, the random forest-based model demonstrated low variance in accuracy across different base learner combinations. Both models were used to predict SOC at new locations in Salt Lake City, Utah, with the linear regression-based model showing more accurate prediction results (nRMSE: 52.9 %, RMSE: 0.48 % OC). This study of the selection of ML algorithms facilitates more reliable monitoring of SOC in various environmental circumstances, supporting establishment of strategies for addressing climate change and for agricultural production by quantifying SOC accurately.</div></div>","PeriodicalId":34794,"journal":{"name":"Environmental Challenges","volume":"19 ","pages":"Article 101160"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Challenges","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667010025000794","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Environmental Science","Score":null,"Total":0}

引用次数: 0

Abstract

The absorption of carbon into the soil and its accurate monitoring is crucial for crop production rates and for mitigating global warming through increased carbon sequestration. Soil organic carbon (SOC) predictions using machine learning techniques have been actively researched because of their ability to handle non-linear relationships and predict accurately with limited prior assumptions about underlying mechanisms. However, the selection of appropriate machine learning methods remains a subject of debate, since each study area has unique data patterns, leading to various prediction performance across different algorithm types. To address these challenges, superlearner algorithm was employed to predict SOC with data from four U.S. states: Arkansas, Idaho, Nebraska, and Utah. Remote sensing variables derived from Sentinel-2 and ALOS PALSAR were used as predictors, with feature selection applied. Results indicated that the linear regression-based superlearner achieved higher accuracy (nRMSE: 7.6 %, R²: 0.804) compared to the random forest-based model (nRMSE: 8.3 %, R²: 0.768), likely due to its ability to better capture the specific data patterns through careful base learner selection and hyperparameter optimization. In contrast, the random forest-based model demonstrated low variance in accuracy across different base learner combinations. Both models were used to predict SOC at new locations in Salt Lake City, Utah, with the linear regression-based model showing more accurate prediction results (nRMSE: 52.9 %, RMSE: 0.48 % OC). This study of the selection of ML algorithms facilitates more reliable monitoring of SOC in various environmental circumstances, supporting establishment of strategies for addressing climate change and for agricultural production by quantifying SOC accurately.

查看原文本刊更多论文

基于遥感变量的土壤有机碳（SOC）超级学习算法预测

土壤对碳的吸收及其准确监测对于作物产量和通过增加碳固存来减缓全球变暖至关重要。利用机器学习技术预测土壤有机碳（SOC）已经得到了积极的研究，因为它们能够处理非线性关系，并在对潜在机制的有限先验假设下准确预测。然而，选择合适的机器学习方法仍然是一个有争议的话题，因为每个研究领域都有独特的数据模式，导致不同算法类型的预测性能不同。为了应对这些挑战，我们使用了超级学习算法来预测美国四个州（阿肯色州、爱达荷州、内布拉斯加州和犹他州）的SOC。利用Sentinel-2和ALOS PALSAR的遥感变量作为预测因子，并应用特征选择。结果表明，与基于随机森林的模型（nRMSE: 8.3%, R²：0.768）相比，基于线性回归的超级学习器获得了更高的准确率（nRMSE: 7.6%, R²：0.804），这可能是因为它能够通过仔细的基础学习器选择和超参数优化来更好地捕获特定的数据模式。相比之下，基于随机森林的模型在不同的基础学习者组合中显示出较低的准确性差异。两种模型均用于预测犹他州盐湖城新地点的土壤有机碳，基于线性回归的模型预测结果更为准确（nRMSE: 52.9%, RMSE: 0.48% OC）。ML算法的选择研究有助于在各种环境条件下更可靠地监测有机碳，通过准确量化有机碳，为应对气候变化和农业生产战略的制定提供支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊