Clustering in geo-data science: Navigating uncertainty to select the most reliable method

IF 3.2 2区 地球科学 Q1 GEOLOGY
Behnam Sadeghi
{"title":"Clustering in geo-data science: Navigating uncertainty to select the most reliable method","authors":"Behnam Sadeghi","doi":"10.1016/j.oregeorev.2025.106591","DOIUrl":null,"url":null,"abstract":"<div><div>Clustering is a fundamental technique in unsupervised learning that groups data points based on their similarities, enabling the discovery of underlying structures and patterns without the need for labeled examples. In geo-data science, clustering plays a pivotal role in applications such as exploratory data analysis, geochemical anomaly detection, and dimensionality reduction for multivariate data analysis. Mineral exploration, in particular, involves various uncertainties that significantly affect decision-making processes. A key source of such uncertainties arises from the choice of clustering methods and their evaluation metrics. Among the widely used clustering methods, K-Means, K-Medoids, Silhouette and Hierarchical Clustering are prominent, with K-Means being a more popular choice in geosciences. Similarly, evaluation techniques such as Silhouette, Davies-Bouldin, Calinski-Harabasz, Elbow, and Bayesian Information Criterion are employed, with the Elbow method being a frequent favorite in geo-data science. However, questions remain regarding the efficiency and suitability of these methods in different contexts. Should we rely solely on K-Means and Elbow, or should we adopt a more case-specific approach, comparing uncertainties and selecting methods that minimize them? This research provides a critical review of clustering methods and evaluation metrics in geo-data science. By developing the <em>pyClusterWise</em> Python package and using illustrative examples, the importance of tailoring these adjustments to the data and selecting clustering techniques and evaluation metrics based on their associated uncertainties is demonstrated. By doing so, the aim is to reduce overall uncertainty and improve decision-making outcomes in mineral exploration.</div></div>","PeriodicalId":19644,"journal":{"name":"Ore Geology Reviews","volume":"181 ","pages":"Article 106591"},"PeriodicalIF":3.2000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ore Geology Reviews","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169136825001519","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Clustering is a fundamental technique in unsupervised learning that groups data points based on their similarities, enabling the discovery of underlying structures and patterns without the need for labeled examples. In geo-data science, clustering plays a pivotal role in applications such as exploratory data analysis, geochemical anomaly detection, and dimensionality reduction for multivariate data analysis. Mineral exploration, in particular, involves various uncertainties that significantly affect decision-making processes. A key source of such uncertainties arises from the choice of clustering methods and their evaluation metrics. Among the widely used clustering methods, K-Means, K-Medoids, Silhouette and Hierarchical Clustering are prominent, with K-Means being a more popular choice in geosciences. Similarly, evaluation techniques such as Silhouette, Davies-Bouldin, Calinski-Harabasz, Elbow, and Bayesian Information Criterion are employed, with the Elbow method being a frequent favorite in geo-data science. However, questions remain regarding the efficiency and suitability of these methods in different contexts. Should we rely solely on K-Means and Elbow, or should we adopt a more case-specific approach, comparing uncertainties and selecting methods that minimize them? This research provides a critical review of clustering methods and evaluation metrics in geo-data science. By developing the pyClusterWise Python package and using illustrative examples, the importance of tailoring these adjustments to the data and selecting clustering techniques and evaluation metrics based on their associated uncertainties is demonstrated. By doing so, the aim is to reduce overall uncertainty and improve decision-making outcomes in mineral exploration.

Abstract Image

地理数据科学中的聚类:导航不确定性以选择最可靠的方法
聚类是无监督学习的一项基本技术,它根据数据点的相似性对数据点进行分组,从而无需标注示例就能发现潜在的结构和模式。在地质数据科学中,聚类在探索性数据分析、地球化学异常检测和多元数据分析的降维等应用中发挥着举足轻重的作用。矿产勘探尤其涉及各种不确定性,对决策过程产生重大影响。这些不确定性的一个主要来源是聚类方法及其评估指标的选择。在广泛使用的聚类方法中,K-Means、K-Medoids、Silhouette 和 Hierarchical Clustering 等方法比较突出,其中 K-Means 在地球科学领域更受欢迎。同样,还采用了 Silhouette、Davies-Bouldin、Calinski-Harabasz、Elbow 和贝叶斯信息标准等评价技术,其中 Elbow 方法是地理数据科学中的常客。然而,关于这些方法在不同情况下的效率和适用性问题依然存在。我们是应该完全依赖 K-Means 和 Elbow 方法,还是应该采用更加具体的方法,对不确定性进行比较并选择将其最小化的方法?本研究对地理数据科学中的聚类方法和评价指标进行了深入评述。通过开发 pyClusterWise Python 软件包和使用示例,证明了根据数据进行调整以及根据相关不确定性选择聚类技术和评价指标的重要性。这样做的目的是减少总体不确定性,改善矿产勘探的决策结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Ore Geology Reviews
Ore Geology Reviews 地学-地质学
CiteScore
6.50
自引率
27.30%
发文量
546
审稿时长
22.9 weeks
期刊介绍: Ore Geology Reviews aims to familiarize all earth scientists with recent advances in a number of interconnected disciplines related to the study of, and search for, ore deposits. The reviews range from brief to longer contributions, but the journal preferentially publishes manuscripts that fill the niche between the commonly shorter journal articles and the comprehensive book coverages, and thus has a special appeal to many authors and readers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信