Strategic imputation of groundwater data using machine learning: Insights from diverse aquifers in the Chao-Phraya River Basin

IF 4.9 Q2 ENGINEERING, ENVIRONMENTAL
Yaggesh Kumar Sharma , Seokhyeon Kim , Amir Saman Tayerani Charmchi , Doosun Kang , Okke Batelaan
{"title":"Strategic imputation of groundwater data using machine learning: Insights from diverse aquifers in the Chao-Phraya River Basin","authors":"Yaggesh Kumar Sharma ,&nbsp;Seokhyeon Kim ,&nbsp;Amir Saman Tayerani Charmchi ,&nbsp;Doosun Kang ,&nbsp;Okke Batelaan","doi":"10.1016/j.gsd.2024.101394","DOIUrl":null,"url":null,"abstract":"<div><div>Effective groundwater monitoring is essential for sustainable water management, particularly in data-sparse regions. To address inconsistencies in groundwater level data, we developed a machine learning framework for robust data imputation, tested in the Chao-Phraya River (CPR) Basin, a region facing significant groundwater challenges due to high population density and ecological importance. Our study evaluated five models—K-Nearest Neighbors (KNN), Multiple Imputation by Chained Equations (MICE), Multilayer Perceptron (MLP), Random Forest (RF), and Soft Imputation (SI) —to fill gaps in monthly groundwater level data across various locations, aquifer depths, and data loss scenarios. Results show that MICE perform well in high-density well environments, while SI excels with lower well density, maintaining Pearson correlation coefficients (R) above 0.80 and RMSE values below 6 even at 10% data loss. The Coefficient of Variation (COV) analysis also confirmed that imputed data remains stable and reliable. However, the study also reveals a significant decrease in model performance in regions with fewer wells, as indicated by increased RMSE and reduced R. Our findings indicate that machine learning models are capable of handling groundwater level observations with missing data. The well density in a region has a significant impact on these model's performance. Imputation techniques should be tailored to each aquifer's specific characteristics and surroundings in order to get accurate groundwater data.</div></div>","PeriodicalId":37879,"journal":{"name":"Groundwater for Sustainable Development","volume":"28 ","pages":"Article 101394"},"PeriodicalIF":4.9000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Groundwater for Sustainable Development","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352801X24003175","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

Effective groundwater monitoring is essential for sustainable water management, particularly in data-sparse regions. To address inconsistencies in groundwater level data, we developed a machine learning framework for robust data imputation, tested in the Chao-Phraya River (CPR) Basin, a region facing significant groundwater challenges due to high population density and ecological importance. Our study evaluated five models—K-Nearest Neighbors (KNN), Multiple Imputation by Chained Equations (MICE), Multilayer Perceptron (MLP), Random Forest (RF), and Soft Imputation (SI) —to fill gaps in monthly groundwater level data across various locations, aquifer depths, and data loss scenarios. Results show that MICE perform well in high-density well environments, while SI excels with lower well density, maintaining Pearson correlation coefficients (R) above 0.80 and RMSE values below 6 even at 10% data loss. The Coefficient of Variation (COV) analysis also confirmed that imputed data remains stable and reliable. However, the study also reveals a significant decrease in model performance in regions with fewer wells, as indicated by increased RMSE and reduced R. Our findings indicate that machine learning models are capable of handling groundwater level observations with missing data. The well density in a region has a significant impact on these model's performance. Imputation techniques should be tailored to each aquifer's specific characteristics and surroundings in order to get accurate groundwater data.

Abstract Image

求助全文
约1分钟内获得全文 求助全文
来源期刊
Groundwater for Sustainable Development
Groundwater for Sustainable Development Social Sciences-Geography, Planning and Development
CiteScore
11.50
自引率
10.20%
发文量
152
期刊介绍: Groundwater for Sustainable Development is directed to different stakeholders and professionals, including government and non-governmental organizations, international funding agencies, universities, public water institutions, public health and other public/private sector professionals, and other relevant institutions. It is aimed at professionals, academics and students in the fields of disciplines such as: groundwater and its connection to surface hydrology and environment, soil sciences, engineering, ecology, microbiology, atmospheric sciences, analytical chemistry, hydro-engineering, water technology, environmental ethics, economics, public health, policy, as well as social sciences, legal disciplines, or any other area connected with water issues. The objectives of this journal are to facilitate: • The improvement of effective and sustainable management of water resources across the globe. • The improvement of human access to groundwater resources in adequate quantity and good quality. • The meeting of the increasing demand for drinking and irrigation water needed for food security to contribute to a social and economically sound human development. • The creation of a global inter- and multidisciplinary platform and forum to improve our understanding of groundwater resources and to advocate their effective and sustainable management and protection against contamination. • Interdisciplinary information exchange and to stimulate scientific research in the fields of groundwater related sciences and social and health sciences required to achieve the United Nations Millennium Development Goals for sustainable development.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信