Pedro Martínez-Santos, Víctor Gómez-Escalonilla, Silvia Díaz-Alcaide, Manuel Rodríguez del Rosario, Héctor Aguilera
{"title":"A surrogate approach to model groundwater level in time and space based on tree regressors","authors":"Pedro Martínez-Santos, Víctor Gómez-Escalonilla, Silvia Díaz-Alcaide, Manuel Rodríguez del Rosario, Héctor Aguilera","doi":"10.1007/s13201-025-02572-5","DOIUrl":null,"url":null,"abstract":"<div><p>Groundwater is a crucial resource for humans and the environment. Protection of groundwater supplies requires tools to explore and understand the behavior of aquifers. This research presents a machine learning approach to predict groundwater levels in time and space based on tree regressors. Covariates comprise dynamic and static items, including spatial coordinates, aquifer properties, timestamps, recharge and pumping data. Certain dynamic variables also include a subset of lag periods to depict seasonality. Algorithms are tested on a set of climatic scenarios in order to observe their ability to predict stable, declining and recovering groundwater trends. Random forest, ExtraTrees and gradient boosting regression behave rather similarly, with generalization scores in excess of 0.95 for wet, dry and average climatic conditions. Predictive accuracy exceeds 0.85 when comparing their long-term forecasts with unseen predictions computed by means of a calibrated numerical model. Feature importance analysis, coupled with the outcomes of partial dependence plots, suggests that tree regressors are able to capture the relevance of dynamic and static variables, thus making the results extrapolable not only in time, but also in space. Outcomes open up an alternative to model groundwater-related variables without necessarily relying on flow and transport equations. This approach can be readily extrapolated to other settings and might offer a rapid means to obtain useful predictions, provided that enough field data is available.</p></div>","PeriodicalId":8374,"journal":{"name":"Applied Water Science","volume":"15 8","pages":""},"PeriodicalIF":5.7000,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s13201-025-02572-5.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Water Science","FirstCategoryId":"93","ListUrlMain":"https://link.springer.com/article/10.1007/s13201-025-02572-5","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"WATER RESOURCES","Score":null,"Total":0}
引用次数: 0
Abstract
Groundwater is a crucial resource for humans and the environment. Protection of groundwater supplies requires tools to explore and understand the behavior of aquifers. This research presents a machine learning approach to predict groundwater levels in time and space based on tree regressors. Covariates comprise dynamic and static items, including spatial coordinates, aquifer properties, timestamps, recharge and pumping data. Certain dynamic variables also include a subset of lag periods to depict seasonality. Algorithms are tested on a set of climatic scenarios in order to observe their ability to predict stable, declining and recovering groundwater trends. Random forest, ExtraTrees and gradient boosting regression behave rather similarly, with generalization scores in excess of 0.95 for wet, dry and average climatic conditions. Predictive accuracy exceeds 0.85 when comparing their long-term forecasts with unseen predictions computed by means of a calibrated numerical model. Feature importance analysis, coupled with the outcomes of partial dependence plots, suggests that tree regressors are able to capture the relevance of dynamic and static variables, thus making the results extrapolable not only in time, but also in space. Outcomes open up an alternative to model groundwater-related variables without necessarily relying on flow and transport equations. This approach can be readily extrapolated to other settings and might offer a rapid means to obtain useful predictions, provided that enough field data is available.