Pongsathorn Thunyawatcharakul, Kyung Hwa Cho and Srilert Chotpantarat*,
{"title":"使用机器学习预测沿海含水层中砷的形态:泰国春武里和罗勇地下水盆地的案例研究","authors":"Pongsathorn Thunyawatcharakul, Kyung Hwa Cho and Srilert Chotpantarat*, ","doi":"10.1021/acsestwater.4c01082","DOIUrl":null,"url":null,"abstract":"<p >This study developed machine learning models to predict arsenic speciation, focusing on As(III), in contaminated groundwater systems. Two input sets were considered: a full set containing comprehensive hydrochemical variables for high-accuracy prediction and a reduced on-site set including only field-measurable parameters and total arsenic, designed for rapid and cost-effective screening. Due to limited As(III) data, models were trained to estimate As(V) instead. Three algorithms: random forest (RF), support vector regression (SVR), and artificial neural network (ANN), were evaluated using 5-fold cross-validation. RF achieved the highest accuracy under the full set, while SVR showed the most robust performance across both input sets. ANN underperformed due to overfitting caused by a scarcity of high-concentration samples. Margin-based learning of SVR allowed the model to maintain stability despite fewer inputs, and outliers were included, suggesting its suitability for fast screening monitoring. The proposed SVR model can reduce arsenic speciation analysis costs by minimizing laboratory requirements while maintaining reliable accuracy, with only total As concentration required. These findings support the integration of SVR-based models into groundwater monitoring frameworks and public health policies, particularly in arsenic-affected regions with limited resources, contributing to more accessible and efficient arsenic risk assessment.</p><p >Machine learning models predict arsenic speciation in groundwater using both annual hydrochemical datasets and limited on-site field measurements.</p>","PeriodicalId":93847,"journal":{"name":"ACS ES&T water","volume":"5 9","pages":"5011–5024"},"PeriodicalIF":4.3000,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/pdf/10.1021/acsestwater.4c01082","citationCount":"0","resultStr":"{\"title\":\"Predicting Arsenic Speciation in Coastal Aquifers Using Machine Learning: A Case Study of the Chonburi and Rayong Groundwater Basins, Thailand\",\"authors\":\"Pongsathorn Thunyawatcharakul, Kyung Hwa Cho and Srilert Chotpantarat*, \",\"doi\":\"10.1021/acsestwater.4c01082\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >This study developed machine learning models to predict arsenic speciation, focusing on As(III), in contaminated groundwater systems. Two input sets were considered: a full set containing comprehensive hydrochemical variables for high-accuracy prediction and a reduced on-site set including only field-measurable parameters and total arsenic, designed for rapid and cost-effective screening. Due to limited As(III) data, models were trained to estimate As(V) instead. Three algorithms: random forest (RF), support vector regression (SVR), and artificial neural network (ANN), were evaluated using 5-fold cross-validation. RF achieved the highest accuracy under the full set, while SVR showed the most robust performance across both input sets. ANN underperformed due to overfitting caused by a scarcity of high-concentration samples. Margin-based learning of SVR allowed the model to maintain stability despite fewer inputs, and outliers were included, suggesting its suitability for fast screening monitoring. The proposed SVR model can reduce arsenic speciation analysis costs by minimizing laboratory requirements while maintaining reliable accuracy, with only total As concentration required. These findings support the integration of SVR-based models into groundwater monitoring frameworks and public health policies, particularly in arsenic-affected regions with limited resources, contributing to more accessible and efficient arsenic risk assessment.</p><p >Machine learning models predict arsenic speciation in groundwater using both annual hydrochemical datasets and limited on-site field measurements.</p>\",\"PeriodicalId\":93847,\"journal\":{\"name\":\"ACS ES&T water\",\"volume\":\"5 9\",\"pages\":\"5011–5024\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-07-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://pubs.acs.org/doi/pdf/10.1021/acsestwater.4c01082\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS ES&T water\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://pubs.acs.org/doi/10.1021/acsestwater.4c01082\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS ES&T water","FirstCategoryId":"1085","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acsestwater.4c01082","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
Predicting Arsenic Speciation in Coastal Aquifers Using Machine Learning: A Case Study of the Chonburi and Rayong Groundwater Basins, Thailand
This study developed machine learning models to predict arsenic speciation, focusing on As(III), in contaminated groundwater systems. Two input sets were considered: a full set containing comprehensive hydrochemical variables for high-accuracy prediction and a reduced on-site set including only field-measurable parameters and total arsenic, designed for rapid and cost-effective screening. Due to limited As(III) data, models were trained to estimate As(V) instead. Three algorithms: random forest (RF), support vector regression (SVR), and artificial neural network (ANN), were evaluated using 5-fold cross-validation. RF achieved the highest accuracy under the full set, while SVR showed the most robust performance across both input sets. ANN underperformed due to overfitting caused by a scarcity of high-concentration samples. Margin-based learning of SVR allowed the model to maintain stability despite fewer inputs, and outliers were included, suggesting its suitability for fast screening monitoring. The proposed SVR model can reduce arsenic speciation analysis costs by minimizing laboratory requirements while maintaining reliable accuracy, with only total As concentration required. These findings support the integration of SVR-based models into groundwater monitoring frameworks and public health policies, particularly in arsenic-affected regions with limited resources, contributing to more accessible and efficient arsenic risk assessment.
Machine learning models predict arsenic speciation in groundwater using both annual hydrochemical datasets and limited on-site field measurements.