Advanced hydrogeochemical facies classification: A comparative analysis of Machine Learning models with SMOTE in the Tawi basin

IF 3 3区 地球科学 Q2 GEOSCIENCES, MULTIDISCIPLINARY
Ajay Kumar Taloor , Shiwalika Sambyal , Ravi Sharma , Surya Dev , Sourabh Shastri , Rakesh Kumar
{"title":"Advanced hydrogeochemical facies classification: A comparative analysis of Machine Learning models with SMOTE in the Tawi basin","authors":"Ajay Kumar Taloor ,&nbsp;Shiwalika Sambyal ,&nbsp;Ravi Sharma ,&nbsp;Surya Dev ,&nbsp;Sourabh Shastri ,&nbsp;Rakesh Kumar","doi":"10.1016/j.pce.2024.103785","DOIUrl":null,"url":null,"abstract":"<div><div>Water is an important natural resource and clean water is vital for maintaining health and hygiene of all living organisms. Estimating and classifying water quality facies is a critical way to analyse water quality and proper water management. The present study underlines the applicability of Machine Learning (ML) models to assess water quality by classifying hydrogeochemical facies within the Tawi basin of the Jammu region. This study employs a range of ML algorithms, including Decision Tree (DT), XGBoost, Random Forest (RF), K-Nearest Neighbors (KNN), and Artificial Neural Network (ANN), to evaluate their effectiveness in accurately classifying hydrogeochemical facies derived from Piper's diagram. The dataset, consisting of chemical parameters extracted from water samples collected from the Tawi basin, was initially imbalanced, with a large majority of samples belonging to a single facies. To address this, we applied the Synthetic Minority Over-sampling Technique (SMOTE), ensuring balanced class distributions for more reliable model training and evaluation. The classification results demonstrate high accuracy across the models, with DT achieving 93%, RF 99%, XGBoost 96%, KNN 81%, and ANN 96%. In addition to overall accuracy, we employed other evaluation metrics such as precision, recall, F1-score, and the precision-recall curve to provide a more comprehensive assessment of model performance. The results underscore the potential of ML in automating water quality assessment based on hydrogeochemical parameters. The findings of the study provide a robust framework for using ML models in determining water quality, particularly in regions where data is scarce and conventional analysis is limited.</div></div>","PeriodicalId":54616,"journal":{"name":"Physics and Chemistry of the Earth","volume":"137 ","pages":"Article 103785"},"PeriodicalIF":3.0000,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physics and Chemistry of the Earth","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1474706524002432","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Water is an important natural resource and clean water is vital for maintaining health and hygiene of all living organisms. Estimating and classifying water quality facies is a critical way to analyse water quality and proper water management. The present study underlines the applicability of Machine Learning (ML) models to assess water quality by classifying hydrogeochemical facies within the Tawi basin of the Jammu region. This study employs a range of ML algorithms, including Decision Tree (DT), XGBoost, Random Forest (RF), K-Nearest Neighbors (KNN), and Artificial Neural Network (ANN), to evaluate their effectiveness in accurately classifying hydrogeochemical facies derived from Piper's diagram. The dataset, consisting of chemical parameters extracted from water samples collected from the Tawi basin, was initially imbalanced, with a large majority of samples belonging to a single facies. To address this, we applied the Synthetic Minority Over-sampling Technique (SMOTE), ensuring balanced class distributions for more reliable model training and evaluation. The classification results demonstrate high accuracy across the models, with DT achieving 93%, RF 99%, XGBoost 96%, KNN 81%, and ANN 96%. In addition to overall accuracy, we employed other evaluation metrics such as precision, recall, F1-score, and the precision-recall curve to provide a more comprehensive assessment of model performance. The results underscore the potential of ML in automating water quality assessment based on hydrogeochemical parameters. The findings of the study provide a robust framework for using ML models in determining water quality, particularly in regions where data is scarce and conventional analysis is limited.
先进的水文地球化学面分类:塔维盆地机器学习模型与 SMOTE 的比较分析
水是一种重要的自然资源,清洁的水对维持所有生物的健康和卫生至关重要。对水质面进行估计和分类是分析水质和进行适当水管理的重要方法。本研究通过对查谟地区塔维盆地的水文地质化学面进行分类,强调了机器学习(ML)模型在评估水质方面的适用性。本研究采用了一系列 ML 算法,包括决策树 (DT)、XGBoost、随机森林 (RF)、K-近邻 (KNN) 和人工神经网络 (ANN),以评估这些算法在对从派珀图中得出的水文地质化学面进行准确分类方面的有效性。该数据集由从塔维盆地采集的水样中提取的化学参数组成,起初并不平衡,绝大多数水样都属于单一水文地质化学面。为解决这一问题,我们采用了合成少数群体过度采样技术(SMOTE),确保类别分布均衡,以进行更可靠的模型训练和评估。分类结果表明,各种模型的准确率都很高,其中 DT 的准确率为 93%,RF 为 99%,XGBoost 为 96%,KNN 为 81%,ANN 为 96%。除总体准确率外,我们还采用了其他评估指标,如精确度、召回率、F1-分数和精确度-召回率曲线,以便对模型性能进行更全面的评估。研究结果凸显了基于水文地质化学参数的 ML 在水质自动评估方面的潜力。研究结果为使用 ML 模型确定水质提供了一个稳健的框架,特别是在数据稀缺和常规分析有限的地区。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Physics and Chemistry of the Earth
Physics and Chemistry of the Earth 地学-地球科学综合
CiteScore
5.40
自引率
2.70%
发文量
176
审稿时长
31.6 weeks
期刊介绍: Physics and Chemistry of the Earth is an international interdisciplinary journal for the rapid publication of collections of refereed communications in separate thematic issues, either stemming from scientific meetings, or, especially compiled for the occasion. There is no restriction on the length of articles published in the journal. Physics and Chemistry of the Earth incorporates the separate Parts A, B and C which existed until the end of 2001. Please note: the Editors are unable to consider submissions that are not invited or linked to a thematic issue. Please do not submit unsolicited papers. The journal covers the following subject areas: -Solid Earth and Geodesy: (geology, geochemistry, tectonophysics, seismology, volcanology, palaeomagnetism and rock magnetism, electromagnetism and potential fields, marine and environmental geosciences as well as geodesy). -Hydrology, Oceans and Atmosphere: (hydrology and water resources research, engineering and management, oceanography and oceanic chemistry, shelf, sea, lake and river sciences, meteorology and atmospheric sciences incl. chemistry as well as climatology and glaciology). -Solar-Terrestrial and Planetary Science: (solar, heliospheric and solar-planetary sciences, geology, geophysics and atmospheric sciences of planets, satellites and small bodies as well as cosmochemistry and exobiology).
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信