{"title":"利用机器学习工具评估突尼斯西南部凯比利浅含水层的水质","authors":"Zohra Kraiem, Kamel Zouari, Rim Trabelsi","doi":"10.1007/s11631-024-00689-z","DOIUrl":null,"url":null,"abstract":"<div><p>An integrated method that implements multivariate statistical analysis and ML methods to evaluate groundwater quality of the shallow aquifers of the Djerid and Kebili district, Southern Tunisia, was adopted. An evaluation of their suitability for irrigation and/or drinking purposes is necessary. A comprehensive hydrochemical assessment of 52 samples with entropy weighted water quality index (EWQI) was also proposed. Eleven water parameters were calculated to ascertain the potential use of those resources in irrigation and drinking. Multivariate analysis showed two main components with Dim1 (variance = 62.3%) and Dim.2 (variance = 22%), due to the bicarbonate, dissolution, and evaporation and the intrusion of drainage water. The evaluation of water quality has been carried out using EWQI model. The calculated EWQI for the Djerid and Kebili waters (i.e., 52 samples) varied between 7.5 and 152.62, indicating a range of 145.12. A mean of 79.12 was lower than the median (88.47). From the calculation of EWQI, only 14 samples are not suitable for irrigation because of their poor to extremely poor quality (26.92%). The bivariate plot showed high correlation for EWQI ~ TH (r = 0.93), EWQI ~ SAR(r = 0.87), indicating that water quality depended on those parameters. Different ML algorithms were successfully applied for the water quality classification. Our results indicated high prediction accuracy (SVM > LDA > ANN > kNN) and perfect classification for kNN, LDA and Naive Bayes. For the purposes of developing the prediction models, the dataset was divided into two groups: training (80%) and testing (20%). To evaluate the models’ performance, RMSE, MSE, MAE and R<sup>2</sup> metrics were used. kNN (R<sup>2</sup> = 0.9359, MAE = 6.49, MSE = 79.00) and LDA (accuracy = 97.56%; kappa = 96.21%) achieved high accuracy. Moreover, linear regression indicated high correlation for both training (R<sup>2</sup> = 0.9727) and testing data (0.9890). This well confirmed the validity of LDA algorithm in predicting water quality. Cross validation showed a high accuracy (92.31%), high sensitivity (89.47%) and high specificity (95%). These findings are fundamentally important for an integrated water resource management in a larger context of sustainable development of the Kebili district.</p></div>","PeriodicalId":7151,"journal":{"name":"Acta Geochimica","volume":"43 6","pages":"1065 - 1086"},"PeriodicalIF":1.4000,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Harnessing machine learning tools for water quality assessment in the Kebili shallow aquifers, Southwestern Tunisia\",\"authors\":\"Zohra Kraiem, Kamel Zouari, Rim Trabelsi\",\"doi\":\"10.1007/s11631-024-00689-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>An integrated method that implements multivariate statistical analysis and ML methods to evaluate groundwater quality of the shallow aquifers of the Djerid and Kebili district, Southern Tunisia, was adopted. An evaluation of their suitability for irrigation and/or drinking purposes is necessary. A comprehensive hydrochemical assessment of 52 samples with entropy weighted water quality index (EWQI) was also proposed. Eleven water parameters were calculated to ascertain the potential use of those resources in irrigation and drinking. Multivariate analysis showed two main components with Dim1 (variance = 62.3%) and Dim.2 (variance = 22%), due to the bicarbonate, dissolution, and evaporation and the intrusion of drainage water. The evaluation of water quality has been carried out using EWQI model. The calculated EWQI for the Djerid and Kebili waters (i.e., 52 samples) varied between 7.5 and 152.62, indicating a range of 145.12. A mean of 79.12 was lower than the median (88.47). From the calculation of EWQI, only 14 samples are not suitable for irrigation because of their poor to extremely poor quality (26.92%). The bivariate plot showed high correlation for EWQI ~ TH (r = 0.93), EWQI ~ SAR(r = 0.87), indicating that water quality depended on those parameters. Different ML algorithms were successfully applied for the water quality classification. Our results indicated high prediction accuracy (SVM > LDA > ANN > kNN) and perfect classification for kNN, LDA and Naive Bayes. For the purposes of developing the prediction models, the dataset was divided into two groups: training (80%) and testing (20%). To evaluate the models’ performance, RMSE, MSE, MAE and R<sup>2</sup> metrics were used. kNN (R<sup>2</sup> = 0.9359, MAE = 6.49, MSE = 79.00) and LDA (accuracy = 97.56%; kappa = 96.21%) achieved high accuracy. Moreover, linear regression indicated high correlation for both training (R<sup>2</sup> = 0.9727) and testing data (0.9890). This well confirmed the validity of LDA algorithm in predicting water quality. Cross validation showed a high accuracy (92.31%), high sensitivity (89.47%) and high specificity (95%). These findings are fundamentally important for an integrated water resource management in a larger context of sustainable development of the Kebili district.</p></div>\",\"PeriodicalId\":7151,\"journal\":{\"name\":\"Acta Geochimica\",\"volume\":\"43 6\",\"pages\":\"1065 - 1086\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2024-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Acta Geochimica\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s11631-024-00689-z\",\"RegionNum\":4,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"GEOCHEMISTRY & GEOPHYSICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Geochimica","FirstCategoryId":"89","ListUrlMain":"https://link.springer.com/article/10.1007/s11631-024-00689-z","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GEOCHEMISTRY & GEOPHYSICS","Score":null,"Total":0}
Harnessing machine learning tools for water quality assessment in the Kebili shallow aquifers, Southwestern Tunisia
An integrated method that implements multivariate statistical analysis and ML methods to evaluate groundwater quality of the shallow aquifers of the Djerid and Kebili district, Southern Tunisia, was adopted. An evaluation of their suitability for irrigation and/or drinking purposes is necessary. A comprehensive hydrochemical assessment of 52 samples with entropy weighted water quality index (EWQI) was also proposed. Eleven water parameters were calculated to ascertain the potential use of those resources in irrigation and drinking. Multivariate analysis showed two main components with Dim1 (variance = 62.3%) and Dim.2 (variance = 22%), due to the bicarbonate, dissolution, and evaporation and the intrusion of drainage water. The evaluation of water quality has been carried out using EWQI model. The calculated EWQI for the Djerid and Kebili waters (i.e., 52 samples) varied between 7.5 and 152.62, indicating a range of 145.12. A mean of 79.12 was lower than the median (88.47). From the calculation of EWQI, only 14 samples are not suitable for irrigation because of their poor to extremely poor quality (26.92%). The bivariate plot showed high correlation for EWQI ~ TH (r = 0.93), EWQI ~ SAR(r = 0.87), indicating that water quality depended on those parameters. Different ML algorithms were successfully applied for the water quality classification. Our results indicated high prediction accuracy (SVM > LDA > ANN > kNN) and perfect classification for kNN, LDA and Naive Bayes. For the purposes of developing the prediction models, the dataset was divided into two groups: training (80%) and testing (20%). To evaluate the models’ performance, RMSE, MSE, MAE and R2 metrics were used. kNN (R2 = 0.9359, MAE = 6.49, MSE = 79.00) and LDA (accuracy = 97.56%; kappa = 96.21%) achieved high accuracy. Moreover, linear regression indicated high correlation for both training (R2 = 0.9727) and testing data (0.9890). This well confirmed the validity of LDA algorithm in predicting water quality. Cross validation showed a high accuracy (92.31%), high sensitivity (89.47%) and high specificity (95%). These findings are fundamentally important for an integrated water resource management in a larger context of sustainable development of the Kebili district.
期刊介绍:
Acta Geochimica serves as the international forum for essential research on geochemistry, the science that uses the tools and principles of chemistry to explain the mechanisms behind major geological systems such as the Earth‘s crust, its oceans and the entire Solar System, as well as a number of processes including mantle convection, the formation of planets and the origins of granite and basalt. The journal focuses on, but is not limited to the following aspects:
• Cosmochemistry
• Mantle Geochemistry
• Ore-deposit Geochemistry
• Organic Geochemistry
• Environmental Geochemistry
• Computational Geochemistry
• Isotope Geochemistry
• NanoGeochemistry
All research articles published in this journal have undergone rigorous peer review. In addition to original research articles, Acta Geochimica publishes reviews and short communications, aiming to rapidly disseminate the research results of timely interest, and comprehensive reviews of emerging topics in all the areas of geochemistry.