{"title":"Assessing feature importance for forecasting soil moisture in subarctic regions using gridded historical and forecasted climate data","authors":"Mojtaba Saboori, Kedar Surendranath Ghag, Anandharuban Panchanathan, Ritesh Patro, Ali Torabi Haghighi","doi":"10.1016/j.geoderma.2025.117304","DOIUrl":null,"url":null,"abstract":"<div><div>Continuous monitoring of soil moisture (SM) is essential in precision agriculture for effective irrigation management. However, SM forecasting in subarctic environments remains relatively unexplored. In this study, we forecast SM at a 30-centimeter soil depth over a 7-day period using Random Forest (RF) model. Two scenarios were evaluated: (a) relying solely on historical data (HIST), and (b) using forecasted environmental data along with recent SM measurements to predict SM levels iteratively, integrating next-day forecasts with current SM data (FORENV). The input features included daily gridded climate data (air temperature-T<sub>air</sub>, relative humidity-RH, wind speed-WS, precipitation-P, and reference evapotranspiration-ET0), soil-vegetation (SV) features (gridded soil temperature-T<sub>soil</sub> and Normalized Difference Vegetation Index-NDVI) and lagged SM values. These data were gathered from six sites under different land covers in subarctic regions (Finland-Tyrnava) over approximately two growing seasons (July or August 2022–September 2023), yielding about 430 daily observations per site. The analysis showed that FORENV outperformed HIST for up to four days into the forecast horizon, highlighting the value of including forecasted variables for improved accuracy during these initial lead times. Longer lead times proved more site-dependent, influenced by the stability of historical SM correlations. Pearson correlation and RF-based stepwise forward feature selection revealed that using only lagged SM data, or combining it with SV features, yielded the most accurate forecasts. For instance, at t + 7 and across all case studies combined, models incorporating LaggedSM_SV achieved the lowest RMSE (0.019 m<sup>3</sup>.m<sup>−3</sup>) and highest R<sup>2</sup> (0.67), followed by All_inputs (RMSE: 0.022 m<sup>3</sup>.m<sup>−3</sup>, R<sup>2</sup>: 0.61), and LaggedSM (RMSE: 0.025 m<sup>3</sup>.m<sup>−3</sup>, R<sup>2</sup>: 0.46). Daily P and RH exhibited consistently low correlations with subsurface SM, likely due to near-saturated soil conditions in many subarctic sites that buffer infiltration and reduce immediate sensitivity to these parameters. Overall, our results demonstrate that robust SM forecasts can be achieved even with limited data, making this approach particularly valuable in subarctic regions with near-saturated soil conditions or other areas where climate and soil-vegetation data may be sparse.</div></div>","PeriodicalId":12511,"journal":{"name":"Geoderma","volume":"458 ","pages":"Article 117304"},"PeriodicalIF":5.6000,"publicationDate":"2025-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geoderma","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0016706125001429","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOIL SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Continuous monitoring of soil moisture (SM) is essential in precision agriculture for effective irrigation management. However, SM forecasting in subarctic environments remains relatively unexplored. In this study, we forecast SM at a 30-centimeter soil depth over a 7-day period using Random Forest (RF) model. Two scenarios were evaluated: (a) relying solely on historical data (HIST), and (b) using forecasted environmental data along with recent SM measurements to predict SM levels iteratively, integrating next-day forecasts with current SM data (FORENV). The input features included daily gridded climate data (air temperature-Tair, relative humidity-RH, wind speed-WS, precipitation-P, and reference evapotranspiration-ET0), soil-vegetation (SV) features (gridded soil temperature-Tsoil and Normalized Difference Vegetation Index-NDVI) and lagged SM values. These data were gathered from six sites under different land covers in subarctic regions (Finland-Tyrnava) over approximately two growing seasons (July or August 2022–September 2023), yielding about 430 daily observations per site. The analysis showed that FORENV outperformed HIST for up to four days into the forecast horizon, highlighting the value of including forecasted variables for improved accuracy during these initial lead times. Longer lead times proved more site-dependent, influenced by the stability of historical SM correlations. Pearson correlation and RF-based stepwise forward feature selection revealed that using only lagged SM data, or combining it with SV features, yielded the most accurate forecasts. For instance, at t + 7 and across all case studies combined, models incorporating LaggedSM_SV achieved the lowest RMSE (0.019 m3.m−3) and highest R2 (0.67), followed by All_inputs (RMSE: 0.022 m3.m−3, R2: 0.61), and LaggedSM (RMSE: 0.025 m3.m−3, R2: 0.46). Daily P and RH exhibited consistently low correlations with subsurface SM, likely due to near-saturated soil conditions in many subarctic sites that buffer infiltration and reduce immediate sensitivity to these parameters. Overall, our results demonstrate that robust SM forecasts can be achieved even with limited data, making this approach particularly valuable in subarctic regions with near-saturated soil conditions or other areas where climate and soil-vegetation data may be sparse.
期刊介绍:
Geoderma - the global journal of soil science - welcomes authors, readers and soil research from all parts of the world, encourages worldwide soil studies, and embraces all aspects of soil science and its associated pedagogy. The journal particularly welcomes interdisciplinary work focusing on dynamic soil processes and functions across space and time.