Assessing feature importance for forecasting soil moisture in subarctic regions using gridded historical and forecasted climate data

IF 5.6 1区 农林科学 Q1 SOIL SCIENCE
Mojtaba Saboori, Kedar Surendranath Ghag, Anandharuban Panchanathan, Ritesh Patro, Ali Torabi Haghighi
{"title":"Assessing feature importance for forecasting soil moisture in subarctic regions using gridded historical and forecasted climate data","authors":"Mojtaba Saboori,&nbsp;Kedar Surendranath Ghag,&nbsp;Anandharuban Panchanathan,&nbsp;Ritesh Patro,&nbsp;Ali Torabi Haghighi","doi":"10.1016/j.geoderma.2025.117304","DOIUrl":null,"url":null,"abstract":"<div><div>Continuous monitoring of soil moisture (SM) is essential in precision agriculture for effective irrigation management. However, SM forecasting in subarctic environments remains relatively unexplored. In this study, we forecast SM at a 30-centimeter soil depth over a 7-day period using Random Forest (RF) model. Two scenarios were evaluated: (a) relying solely on historical data (HIST), and (b) using forecasted environmental data along with recent SM measurements to predict SM levels iteratively, integrating next-day forecasts with current SM data (FORENV). The input features included daily gridded climate data (air temperature-T<sub>air</sub>, relative humidity-RH, wind speed-WS, precipitation-P, and reference evapotranspiration-ET0), soil-vegetation (SV) features (gridded soil temperature-T<sub>soil</sub> and Normalized Difference Vegetation Index-NDVI) and lagged SM values. These data were gathered from six sites under different land covers in subarctic regions (Finland-Tyrnava) over approximately two growing seasons (July or August 2022–September 2023), yielding about 430 daily observations per site. The analysis showed that FORENV outperformed HIST for up to four days into the forecast horizon, highlighting the value of including forecasted variables for improved accuracy during these initial lead times. Longer lead times proved more site-dependent, influenced by the stability of historical SM correlations. Pearson correlation and RF-based stepwise forward feature selection revealed that using only lagged SM data, or combining it with SV features, yielded the most accurate forecasts. For instance, at t + 7 and across all case studies combined, models incorporating LaggedSM_SV achieved the lowest RMSE (0.019 m<sup>3</sup>.m<sup>−3</sup>) and highest R<sup>2</sup> (0.67), followed by All_inputs (RMSE: 0.022 m<sup>3</sup>.m<sup>−3</sup>, R<sup>2</sup>: 0.61), and LaggedSM (RMSE: 0.025 m<sup>3</sup>.m<sup>−3</sup>, R<sup>2</sup>: 0.46). Daily P and RH exhibited consistently low correlations with subsurface SM, likely due to near-saturated soil conditions in many subarctic sites that buffer infiltration and reduce immediate sensitivity to these parameters. Overall, our results demonstrate that robust SM forecasts can be achieved even with limited data, making this approach particularly valuable in subarctic regions with near-saturated soil conditions or other areas where climate and soil-vegetation data may be sparse.</div></div>","PeriodicalId":12511,"journal":{"name":"Geoderma","volume":"458 ","pages":"Article 117304"},"PeriodicalIF":5.6000,"publicationDate":"2025-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geoderma","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0016706125001429","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOIL SCIENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Continuous monitoring of soil moisture (SM) is essential in precision agriculture for effective irrigation management. However, SM forecasting in subarctic environments remains relatively unexplored. In this study, we forecast SM at a 30-centimeter soil depth over a 7-day period using Random Forest (RF) model. Two scenarios were evaluated: (a) relying solely on historical data (HIST), and (b) using forecasted environmental data along with recent SM measurements to predict SM levels iteratively, integrating next-day forecasts with current SM data (FORENV). The input features included daily gridded climate data (air temperature-Tair, relative humidity-RH, wind speed-WS, precipitation-P, and reference evapotranspiration-ET0), soil-vegetation (SV) features (gridded soil temperature-Tsoil and Normalized Difference Vegetation Index-NDVI) and lagged SM values. These data were gathered from six sites under different land covers in subarctic regions (Finland-Tyrnava) over approximately two growing seasons (July or August 2022–September 2023), yielding about 430 daily observations per site. The analysis showed that FORENV outperformed HIST for up to four days into the forecast horizon, highlighting the value of including forecasted variables for improved accuracy during these initial lead times. Longer lead times proved more site-dependent, influenced by the stability of historical SM correlations. Pearson correlation and RF-based stepwise forward feature selection revealed that using only lagged SM data, or combining it with SV features, yielded the most accurate forecasts. For instance, at t + 7 and across all case studies combined, models incorporating LaggedSM_SV achieved the lowest RMSE (0.019 m3.m−3) and highest R2 (0.67), followed by All_inputs (RMSE: 0.022 m3.m−3, R2: 0.61), and LaggedSM (RMSE: 0.025 m3.m−3, R2: 0.46). Daily P and RH exhibited consistently low correlations with subsurface SM, likely due to near-saturated soil conditions in many subarctic sites that buffer infiltration and reduce immediate sensitivity to these parameters. Overall, our results demonstrate that robust SM forecasts can be achieved even with limited data, making this approach particularly valuable in subarctic regions with near-saturated soil conditions or other areas where climate and soil-vegetation data may be sparse.
利用网格化历史和预测气候数据评估亚北极地区土壤湿度预测特征的重要性
土壤水分的连续监测是精准农业有效灌溉管理的必要条件。然而,亚北极环境下的SM预测仍然相对未被探索。在本研究中,我们利用随机森林(Random Forest, RF)模型预测了30 cm土壤深度下7 d的SM。评估了两种情景:(a)仅依靠历史数据(HIST), (b)使用预测的环境数据和最近的SM测量来迭代预测SM水平,将第二天的预测与当前的SM数据(FORENV)结合起来。输入特征包括日网格化气候数据(气温- tair、相对湿度- rh、风速- ws、降水- p和参考蒸散- et0)、土壤-植被(SV)特征(土壤温度- tsoil和归一化植被指数- ndvi)和滞后的SM值。这些数据是从亚北极地区(芬兰-泰尔纳瓦)不同土地覆盖下的六个地点收集的,收集时间大约为两个生长季节(2022年7月或8月至2023年9月),每个地点每天约有430次观测。分析表明,在预测期内,FORENV的表现优于HIST长达4天,这突出了在这些初始提前期内纳入预测变量以提高准确性的价值。较长的交货期证明了更多的地点依赖,受历史SM相关性稳定性的影响。Pearson相关性和基于rf的逐步前向特征选择表明,仅使用滞后的SM数据,或将其与SV特征相结合,可以产生最准确的预测。例如,在t + 7和所有案例研究中,包含LaggedSM_SV的模型获得了最低的RMSE (0.019 m3.m - 3)和最高的R2(0.67),其次是All_inputs (RMSE: 0.022 m3)。m−3,R2: 0.61),和LaggedSM (RMSE: 0.025 m3)。m−3,R2: 0.46)。日P和RH与地下SM的相关性一直很低,这可能是由于许多亚北极地区接近饱和的土壤条件缓冲了渗透,降低了对这些参数的直接敏感性。总的来说,我们的结果表明,即使数据有限,也可以实现稳健的SM预测,这使得这种方法在接近饱和土壤条件的亚北极地区或其他气候和土壤植被数据可能稀少的地区特别有价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Geoderma
Geoderma 农林科学-土壤科学
CiteScore
11.80
自引率
6.60%
发文量
597
审稿时长
58 days
期刊介绍: Geoderma - the global journal of soil science - welcomes authors, readers and soil research from all parts of the world, encourages worldwide soil studies, and embraces all aspects of soil science and its associated pedagogy. The journal particularly welcomes interdisciplinary work focusing on dynamic soil processes and functions across space and time.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信