V. Gómez-Escalonilla, E. Montero-González, S. Díaz-Alcaide, M. Martín-Loeches, M. Rodríguez del Rosario, P. Martínez-Santos
{"title":"A machine learning approach to site groundwater contamination monitoring wells","authors":"V. Gómez-Escalonilla, E. Montero-González, S. Díaz-Alcaide, M. Martín-Loeches, M. Rodríguez del Rosario, P. Martínez-Santos","doi":"10.1007/s13201-024-02320-1","DOIUrl":null,"url":null,"abstract":"<div><p>Effective monitoring of groundwater contamination is crucial to protect human livelihoods and ecosystems. This paper presents a machine learning-based approach to improve groundwater monitoring networks by providing predictions of groundwater contamination in space. The method is demonstrated through a practical application in Central Spain, where nitrate was used as a proxy for groundwater contamination. Predictive mapping identifies the spatial markers for groundwater contamination based on twenty-four predictor variables and a dataset of 213 existing monitoring boreholes. Tree-based algorithms found meaningful associations between the explanatory variables and known nitrate concentrations. Comparing the outcomes of the algorithms with the areas officially delineated as vulnerable to nitrate suggests that machine learning algorithms are able to predict groundwater contamination. The extra trees algorithm outperformed decision trees, random forest, gradient boosting, and AdaBoost classifiers, with an area under the curve score in excess of 0.88. Major predictors for groundwater contamination were depth to the water table, lithology, distance to rivers, and distance to livestock farms. Predictive mapping suggests that there are unmonitored regions to the northeast and to the southwest of Madrid’s metropolitan area that present similar markers to monitored regions known to be contaminated. These unmonitored areas should be prioritized in future attempts to improve the network. From a research perspective, the main conclusion of this work is that machine learning techniques can be used as a technique to automate the siting of monitoring boreholes. Practical applications should nevertheless be overseen by an expert eye to guarantee the quality of the outcomes.</p></div>","PeriodicalId":8374,"journal":{"name":"Applied Water Science","volume":"14 12","pages":""},"PeriodicalIF":5.7000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s13201-024-02320-1.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Water Science","FirstCategoryId":"93","ListUrlMain":"https://link.springer.com/article/10.1007/s13201-024-02320-1","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"WATER RESOURCES","Score":null,"Total":0}
引用次数: 0
Abstract
Effective monitoring of groundwater contamination is crucial to protect human livelihoods and ecosystems. This paper presents a machine learning-based approach to improve groundwater monitoring networks by providing predictions of groundwater contamination in space. The method is demonstrated through a practical application in Central Spain, where nitrate was used as a proxy for groundwater contamination. Predictive mapping identifies the spatial markers for groundwater contamination based on twenty-four predictor variables and a dataset of 213 existing monitoring boreholes. Tree-based algorithms found meaningful associations between the explanatory variables and known nitrate concentrations. Comparing the outcomes of the algorithms with the areas officially delineated as vulnerable to nitrate suggests that machine learning algorithms are able to predict groundwater contamination. The extra trees algorithm outperformed decision trees, random forest, gradient boosting, and AdaBoost classifiers, with an area under the curve score in excess of 0.88. Major predictors for groundwater contamination were depth to the water table, lithology, distance to rivers, and distance to livestock farms. Predictive mapping suggests that there are unmonitored regions to the northeast and to the southwest of Madrid’s metropolitan area that present similar markers to monitored regions known to be contaminated. These unmonitored areas should be prioritized in future attempts to improve the network. From a research perspective, the main conclusion of this work is that machine learning techniques can be used as a technique to automate the siting of monitoring boreholes. Practical applications should nevertheless be overseen by an expert eye to guarantee the quality of the outcomes.