Soil organic carbon retrieval using a machine learning approach from satellite and environmental covariates in the Lower Brazos River Watershed, Texas, USA

IF 3.2 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Birhan Getachew Tikuye, Ram Lakhan Ray
{"title":"Soil organic carbon retrieval using a machine learning approach from satellite and environmental covariates in the Lower Brazos River Watershed, Texas, USA","authors":"Birhan Getachew Tikuye,&nbsp;Ram Lakhan Ray","doi":"10.1016/j.acags.2025.100252","DOIUrl":null,"url":null,"abstract":"<div><div>Soil is critical in global carbon storage, holding more carbon than terrestrial vegetation and the atmosphere combined. Accurate soil organic carbon (SOC) estimation is essential for improving agricultural productivity and mitigating climate change. This study aims to explore the retrieval of SOC using a machine learning (ML) approach, leveraging remote sensing data and environmental covariates, focusing on the Lower Brazos River Watershed, southern Texas, USA. The study used Sentinel 2A satellite data-derived indices such as vegetation and water indices, topographic features, soil properties, and climatic factors. Three ML models, namely Gradient Boosting (GB), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost), were deployed, with performance assessed using the R<sup>2</sup>, RMSE, and MAE. All explanatory variables are geospatial gridded datasets, except for the point-based measurement of SOC on the Prairie View A&amp;M University (PVAMU) research farm plot used to train the model. The RF model demonstrated the best performance in model testing, with the lowest root mean square error (RMSE = 4.17) and mean absolute error (MAE = 3), as well as the highest coefficient of determination (R<sup>2</sup> = 0.78). GB was the second-best performing model, achieving an RMSE of 4.23 and an MAE of 3.12, with similar R<sup>2</sup> values to the RF model. The average SOC throughout the watershed is 45.5 tons/ha, while the total amount of SOC in the watershed is around 4,278,263 tons. These results suggest that integrating satellite data with environmental covariates and machine learning models holds excellent potential for SOC prediction and supports climate change mitigation efforts by improving carbon stock assessments.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"26 ","pages":"Article 100252"},"PeriodicalIF":3.2000,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing and Geosciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590197425000345","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Soil is critical in global carbon storage, holding more carbon than terrestrial vegetation and the atmosphere combined. Accurate soil organic carbon (SOC) estimation is essential for improving agricultural productivity and mitigating climate change. This study aims to explore the retrieval of SOC using a machine learning (ML) approach, leveraging remote sensing data and environmental covariates, focusing on the Lower Brazos River Watershed, southern Texas, USA. The study used Sentinel 2A satellite data-derived indices such as vegetation and water indices, topographic features, soil properties, and climatic factors. Three ML models, namely Gradient Boosting (GB), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost), were deployed, with performance assessed using the R2, RMSE, and MAE. All explanatory variables are geospatial gridded datasets, except for the point-based measurement of SOC on the Prairie View A&M University (PVAMU) research farm plot used to train the model. The RF model demonstrated the best performance in model testing, with the lowest root mean square error (RMSE = 4.17) and mean absolute error (MAE = 3), as well as the highest coefficient of determination (R2 = 0.78). GB was the second-best performing model, achieving an RMSE of 4.23 and an MAE of 3.12, with similar R2 values to the RF model. The average SOC throughout the watershed is 45.5 tons/ha, while the total amount of SOC in the watershed is around 4,278,263 tons. These results suggest that integrating satellite data with environmental covariates and machine learning models holds excellent potential for SOC prediction and supports climate change mitigation efforts by improving carbon stock assessments.
基于卫星和环境协变量的机器学习方法在美国德克萨斯州下布拉索斯河流域土壤有机碳检索
土壤在全球碳储存中起着至关重要的作用,它所储存的碳比陆地植被和大气加起来还要多。准确的土壤有机碳(SOC)估算对于提高农业生产力和减缓气候变化至关重要。本研究以美国德克萨斯州南部下布拉索斯河流域为研究对象,利用遥感数据和环境协变量,探索利用机器学习(ML)方法检索土壤有机碳。该研究使用哨兵2A卫星数据衍生的指数,如植被和水指数、地形特征、土壤性质和气候因素。部署了三种ML模型,即梯度增强(GB),随机森林(RF)和极端梯度增强(XGBoost),并使用R2, RMSE和MAE评估性能。所有解释变量都是地理空间网格数据集,除了用于训练模型的基于点的草原视图A&;M大学(PVAMU)研究农场地块的SOC测量。RF模型在模型检验中表现最好,具有最低的均方根误差(RMSE = 4.17)和平均绝对误差(MAE = 3),决定系数最高(R2 = 0.78)。GB是表现第二好的模型,RMSE为4.23,MAE为3.12,R2值与RF模型相似。整个流域的SOC平均为45.5吨/公顷,而流域的SOC总量约为4278263吨。这些结果表明,将卫星数据与环境协变量和机器学习模型相结合,在有机碳预测方面具有很大的潜力,并通过改进碳储量评估来支持减缓气候变化的努力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Applied Computing and Geosciences
Applied Computing and Geosciences Computer Science-General Computer Science
CiteScore
5.50
自引率
0.00%
发文量
23
审稿时长
5 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信