{"title":"Harnessing Variational Autoencoders and self-organising maps for groundwater contamination assessment in peri-urban Ghana","authors":"Portia Annabelle Opoku , Raymond Webrah Kazapoe , Noah Kwaku Baah , Abass Gibrilla , Geophrey K. Anornu , Nana Kobea Bonso","doi":"10.1016/j.jafrearsci.2025.105866","DOIUrl":null,"url":null,"abstract":"<div><div>Although advanced machine learning models have demonstrated considerable potential for environmental monitoring, their application to assessing groundwater contamination in Ghana's peri-urban areas remains inadequately explored and poorly understood. To bridge this gap, this study aimed to apply advanced non-linear machine learning techniques, specifically Variational Autoencoders (VAEs) and Self-Organising Maps (SOMs), to analyse groundwater contaminants in south-eastern Ghana. The study examines intricate relationships and patterns among various pollutants to provide a comprehensive evaluation of groundwater quality. All the physicochemical parameters evaluated fell within the WHO guideline values. The VAE and SOM analyses confirm dual-source controls on groundwater chemistry in the Birimian terrains, involving both natural geogenic inputs from silicate and mafic lithologies and anthropogenic impacts from settlements. Inverse loadings across latent dimensions captured spatial heterogeneity, separating lithology-driven variables (e.g., Na<sup>+</sup>, Ca<sup>2+</sup>, EC) from pollution markers (e.g., NO<sub>3</sub><sup>−</sup>, Cl<sup>−</sup>). SOM clustering further distinguished zones of minimal human influence from areas with localised contamination, such as Pb hotspots and elevated EC and salinity linked to mineralisation or saline intrusion. Scattered peaks in F<sup>−</sup> and Cl<sup>−</sup> suggested episodic anthropogenic inputs. The results reveal notable disparities in machine learning model performance based on target variable features; the Nitrate Pollution Index (NPI) yielded a Test R<sup>2</sup> of 0.983, indicating superior predictive accuracy. Conversely, challenges with the Fluoride Pollution Index (FPI) and Pollution Index of Groundwater (PIG) exposed limitations due to unmeasured geological factors and low variability. We propose a data-driven, scalable diagnostic tool for monitoring water quality that can be integrated into national frameworks. This tool has implications for Sub-Saharan Africa and other regions similarly affected.</div></div>","PeriodicalId":14874,"journal":{"name":"Journal of African Earth Sciences","volume":"233 ","pages":"Article 105866"},"PeriodicalIF":2.2000,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of African Earth Sciences","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1464343X25003334","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Although advanced machine learning models have demonstrated considerable potential for environmental monitoring, their application to assessing groundwater contamination in Ghana's peri-urban areas remains inadequately explored and poorly understood. To bridge this gap, this study aimed to apply advanced non-linear machine learning techniques, specifically Variational Autoencoders (VAEs) and Self-Organising Maps (SOMs), to analyse groundwater contaminants in south-eastern Ghana. The study examines intricate relationships and patterns among various pollutants to provide a comprehensive evaluation of groundwater quality. All the physicochemical parameters evaluated fell within the WHO guideline values. The VAE and SOM analyses confirm dual-source controls on groundwater chemistry in the Birimian terrains, involving both natural geogenic inputs from silicate and mafic lithologies and anthropogenic impacts from settlements. Inverse loadings across latent dimensions captured spatial heterogeneity, separating lithology-driven variables (e.g., Na+, Ca2+, EC) from pollution markers (e.g., NO3−, Cl−). SOM clustering further distinguished zones of minimal human influence from areas with localised contamination, such as Pb hotspots and elevated EC and salinity linked to mineralisation or saline intrusion. Scattered peaks in F− and Cl− suggested episodic anthropogenic inputs. The results reveal notable disparities in machine learning model performance based on target variable features; the Nitrate Pollution Index (NPI) yielded a Test R2 of 0.983, indicating superior predictive accuracy. Conversely, challenges with the Fluoride Pollution Index (FPI) and Pollution Index of Groundwater (PIG) exposed limitations due to unmeasured geological factors and low variability. We propose a data-driven, scalable diagnostic tool for monitoring water quality that can be integrated into national frameworks. This tool has implications for Sub-Saharan Africa and other regions similarly affected.
期刊介绍:
The Journal of African Earth Sciences sees itself as the prime geological journal for all aspects of the Earth Sciences about the African plate. Papers dealing with peripheral areas are welcome if they demonstrate a tight link with Africa.
The Journal publishes high quality, peer-reviewed scientific papers. It is devoted primarily to research papers but short communications relating to new developments of broad interest, reviews and book reviews will also be considered. Papers must have international appeal and should present work of more regional than local significance and dealing with well identified and justified scientific questions. Specialised technical papers, analytical or exploration reports must be avoided. Papers on applied geology should preferably be linked to such core disciplines and must be addressed to a more general geoscientific audience.