Soil organic carbon retrieval using a machine learning approach from satellite and environmental covariates in the Lower Brazos River Watershed, Texas, USA
IF 3.2 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
{"title":"Soil organic carbon retrieval using a machine learning approach from satellite and environmental covariates in the Lower Brazos River Watershed, Texas, USA","authors":"Birhan Getachew Tikuye, Ram Lakhan Ray","doi":"10.1016/j.acags.2025.100252","DOIUrl":null,"url":null,"abstract":"<div><div>Soil is critical in global carbon storage, holding more carbon than terrestrial vegetation and the atmosphere combined. Accurate soil organic carbon (SOC) estimation is essential for improving agricultural productivity and mitigating climate change. This study aims to explore the retrieval of SOC using a machine learning (ML) approach, leveraging remote sensing data and environmental covariates, focusing on the Lower Brazos River Watershed, southern Texas, USA. The study used Sentinel 2A satellite data-derived indices such as vegetation and water indices, topographic features, soil properties, and climatic factors. Three ML models, namely Gradient Boosting (GB), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost), were deployed, with performance assessed using the R<sup>2</sup>, RMSE, and MAE. All explanatory variables are geospatial gridded datasets, except for the point-based measurement of SOC on the Prairie View A&M University (PVAMU) research farm plot used to train the model. The RF model demonstrated the best performance in model testing, with the lowest root mean square error (RMSE = 4.17) and mean absolute error (MAE = 3), as well as the highest coefficient of determination (R<sup>2</sup> = 0.78). GB was the second-best performing model, achieving an RMSE of 4.23 and an MAE of 3.12, with similar R<sup>2</sup> values to the RF model. The average SOC throughout the watershed is 45.5 tons/ha, while the total amount of SOC in the watershed is around 4,278,263 tons. These results suggest that integrating satellite data with environmental covariates and machine learning models holds excellent potential for SOC prediction and supports climate change mitigation efforts by improving carbon stock assessments.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"26 ","pages":"Article 100252"},"PeriodicalIF":3.2000,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing and Geosciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590197425000345","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Soil is critical in global carbon storage, holding more carbon than terrestrial vegetation and the atmosphere combined. Accurate soil organic carbon (SOC) estimation is essential for improving agricultural productivity and mitigating climate change. This study aims to explore the retrieval of SOC using a machine learning (ML) approach, leveraging remote sensing data and environmental covariates, focusing on the Lower Brazos River Watershed, southern Texas, USA. The study used Sentinel 2A satellite data-derived indices such as vegetation and water indices, topographic features, soil properties, and climatic factors. Three ML models, namely Gradient Boosting (GB), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost), were deployed, with performance assessed using the R2, RMSE, and MAE. All explanatory variables are geospatial gridded datasets, except for the point-based measurement of SOC on the Prairie View A&M University (PVAMU) research farm plot used to train the model. The RF model demonstrated the best performance in model testing, with the lowest root mean square error (RMSE = 4.17) and mean absolute error (MAE = 3), as well as the highest coefficient of determination (R2 = 0.78). GB was the second-best performing model, achieving an RMSE of 4.23 and an MAE of 3.12, with similar R2 values to the RF model. The average SOC throughout the watershed is 45.5 tons/ha, while the total amount of SOC in the watershed is around 4,278,263 tons. These results suggest that integrating satellite data with environmental covariates and machine learning models holds excellent potential for SOC prediction and supports climate change mitigation efforts by improving carbon stock assessments.