Groundwater health probability risk prediction through oral intake using advanced optimization methods

IF 4.4 3区 环境科学与生态学 Q2 ENVIRONMENTAL SCIENCES
Fahad Jibrin Abdu , Sani I. Abba , Jamilu Usman , Maad Alowaifeer , Isam H. Aljundi
{"title":"Groundwater health probability risk prediction through oral intake using advanced optimization methods","authors":"Fahad Jibrin Abdu ,&nbsp;Sani I. Abba ,&nbsp;Jamilu Usman ,&nbsp;Maad Alowaifeer ,&nbsp;Isam H. Aljundi","doi":"10.1016/j.jconhyd.2025.104670","DOIUrl":null,"url":null,"abstract":"<div><div>Examining the cancer risk associated with oral groundwater (GW) intake is crucial, particularly in regions heavily reliant on GW for human consumption and agriculture. The study was based on real field investigations and controlled laboratory experiments. We integrated real experimental data with generative AI-driven synthetic data to construct a comprehensive dataset. Subsequently, we compared the predictive efficiency of both data sources. We evaluated the reliability of generative AI in generating scientific data, providing critical insights into its applicability for enhancing experimental analysis. The study also evaluates standalone models, including Artificial Neural Networks (ANN), Gaussian Process Regression (GPR), Support Vector Machines (SVM), and Boosted Trees (BT), with and without Bayesian Optimization (BO), for predicting the probability of cancer risk (PCR) from GW ingestion. On real data, during training, ANN achieved the lowest Mean Absolute Error (MAE = 0.1483), Mean Square Error (MSE = 0.1231), and Root Mean Square Error (RMSE = 0.3508), while GPR, SVM, and BT exhibited higher training errors. In the testing phase, ANN continued to lead with an MAE of 0.5733, MSE of 0.6356, and RMSE of 0.7972. When optimized with BO, ANN-BO achieved an MAE of 0.1686, MSE of 0.1097, and RMSE of 0.3312 during training, with GPR + BO close behind (MAE = 0.1679, MSE = 0.1095, RMSE = 0.3310). During testing with BO, ANN-BO further improved (MAE = 0.0902, MSE = 0.0129, RMSE = 0.1136). However, on synthetic data, even optimized models like ANN-BO demonstrated higher testing error (MAE = 15.718, MSE = 374.53, RMSE = 19.353), underscoring limitations in capturing real-world complexities. High error values across models indicate that synthetic data alone is insufficient for accurate health risk assessments. Leveraging real-world data remains essential for enhancing predictive accuracy and minimizing errors, emphasizing the crucial role of data quality in achieving reliable cancer risk predictions from genome-wide (GW) ingestion.</div></div>","PeriodicalId":15530,"journal":{"name":"Journal of contaminant hydrology","volume":"274 ","pages":"Article 104670"},"PeriodicalIF":4.4000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of contaminant hydrology","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169772225001755","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Examining the cancer risk associated with oral groundwater (GW) intake is crucial, particularly in regions heavily reliant on GW for human consumption and agriculture. The study was based on real field investigations and controlled laboratory experiments. We integrated real experimental data with generative AI-driven synthetic data to construct a comprehensive dataset. Subsequently, we compared the predictive efficiency of both data sources. We evaluated the reliability of generative AI in generating scientific data, providing critical insights into its applicability for enhancing experimental analysis. The study also evaluates standalone models, including Artificial Neural Networks (ANN), Gaussian Process Regression (GPR), Support Vector Machines (SVM), and Boosted Trees (BT), with and without Bayesian Optimization (BO), for predicting the probability of cancer risk (PCR) from GW ingestion. On real data, during training, ANN achieved the lowest Mean Absolute Error (MAE = 0.1483), Mean Square Error (MSE = 0.1231), and Root Mean Square Error (RMSE = 0.3508), while GPR, SVM, and BT exhibited higher training errors. In the testing phase, ANN continued to lead with an MAE of 0.5733, MSE of 0.6356, and RMSE of 0.7972. When optimized with BO, ANN-BO achieved an MAE of 0.1686, MSE of 0.1097, and RMSE of 0.3312 during training, with GPR + BO close behind (MAE = 0.1679, MSE = 0.1095, RMSE = 0.3310). During testing with BO, ANN-BO further improved (MAE = 0.0902, MSE = 0.0129, RMSE = 0.1136). However, on synthetic data, even optimized models like ANN-BO demonstrated higher testing error (MAE = 15.718, MSE = 374.53, RMSE = 19.353), underscoring limitations in capturing real-world complexities. High error values across models indicate that synthetic data alone is insufficient for accurate health risk assessments. Leveraging real-world data remains essential for enhancing predictive accuracy and minimizing errors, emphasizing the crucial role of data quality in achieving reliable cancer risk predictions from genome-wide (GW) ingestion.
利用先进的优化方法进行地下水口服摄入健康概率风险预测
研究与口服地下水(GW)摄入相关的癌症风险至关重要,特别是在人类消费和农业严重依赖地下水的地区。这项研究是基于真实的实地调查和受控的实验室实验。我们将真实实验数据与生成式人工智能驱动的合成数据相结合,构建了一个全面的数据集。随后,我们比较了两种数据源的预测效率。我们评估了生成式人工智能在生成科学数据方面的可靠性,为其增强实验分析的适用性提供了关键见解。该研究还评估了独立模型,包括人工神经网络(ANN)、高斯过程回归(GPR)、支持向量机(SVM)和提升树(BT),以及有无贝叶斯优化(BO),用于预测GW摄入的癌症风险概率(PCR)。在真实数据训练过程中,ANN的平均绝对误差(MAE = 0.1483)、均方误差(MSE = 0.1231)和均方根误差(RMSE = 0.3508)最低,而GPR、SVM和BT的训练误差较高。在测试阶段,ANN继续领先,MAE为0.5733,MSE为0.6356,RMSE为0.7972。用BO进行优化后,ANN-BO在训练过程中的MAE为0.1686,MSE为0.1097,RMSE为0.3312,GPR + BO紧随其后(MAE = 0.1679, MSE = 0.1095, RMSE = 0.3310)。在BO测试中,ANN-BO进一步改善(MAE = 0.0902, MSE = 0.0129, RMSE = 0.1136)。然而,在合成数据上,即使像ANN-BO这样的优化模型也显示出更高的测试误差(MAE = 15.718, MSE = 374.53, RMSE = 19.353),这凸显了在捕捉现实世界复杂性方面的局限性。各模型的高误差值表明,仅靠合成数据不足以进行准确的健康风险评估。利用真实世界的数据对于提高预测准确性和减少错误仍然至关重要,强调数据质量在通过全基因组(GW)摄入实现可靠的癌症风险预测方面的关键作用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of contaminant hydrology
Journal of contaminant hydrology 环境科学-地球科学综合
CiteScore
6.80
自引率
2.80%
发文量
129
审稿时长
68 days
期刊介绍: The Journal of Contaminant Hydrology is an international journal publishing scientific articles pertaining to the contamination of subsurface water resources. Emphasis is placed on investigations of the physical, chemical, and biological processes influencing the behavior and fate of organic and inorganic contaminants in the unsaturated (vadose) and saturated (groundwater) zones, as well as at groundwater-surface water interfaces. The ecological impacts of contaminants transported both from and to aquifers are of interest. Articles on contamination of surface water only, without a link to groundwater, are out of the scope. Broad latitude is allowed in identifying contaminants of interest, and include legacy and emerging pollutants, nutrients, nanoparticles, pathogenic microorganisms (e.g., bacteria, viruses, protozoa), microplastics, and various constituents associated with energy production (e.g., methane, carbon dioxide, hydrogen sulfide). The journal''s scope embraces a wide range of topics including: experimental investigations of contaminant sorption, diffusion, transformation, volatilization and transport in the surface and subsurface; characterization of soil and aquifer properties only as they influence contaminant behavior; development and testing of mathematical models of contaminant behaviour; innovative techniques for restoration of contaminated sites; development of new tools or techniques for monitoring the extent of soil and groundwater contamination; transformation of contaminants in the hyporheic zone; effects of contaminants traversing the hyporheic zone on surface water and groundwater ecosystems; subsurface carbon sequestration and/or turnover; and migration of fluids associated with energy production into groundwater.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信