{"title":"Clustering in geo-data science: Navigating uncertainty to select the most reliable method","authors":"Behnam Sadeghi","doi":"10.1016/j.oregeorev.2025.106591","DOIUrl":null,"url":null,"abstract":"<div><div>Clustering is a fundamental technique in unsupervised learning that groups data points based on their similarities, enabling the discovery of underlying structures and patterns without the need for labeled examples. In geo-data science, clustering plays a pivotal role in applications such as exploratory data analysis, geochemical anomaly detection, and dimensionality reduction for multivariate data analysis. Mineral exploration, in particular, involves various uncertainties that significantly affect decision-making processes. A key source of such uncertainties arises from the choice of clustering methods and their evaluation metrics. Among the widely used clustering methods, K-Means, K-Medoids, Silhouette and Hierarchical Clustering are prominent, with K-Means being a more popular choice in geosciences. Similarly, evaluation techniques such as Silhouette, Davies-Bouldin, Calinski-Harabasz, Elbow, and Bayesian Information Criterion are employed, with the Elbow method being a frequent favorite in geo-data science. However, questions remain regarding the efficiency and suitability of these methods in different contexts. Should we rely solely on K-Means and Elbow, or should we adopt a more case-specific approach, comparing uncertainties and selecting methods that minimize them? This research provides a critical review of clustering methods and evaluation metrics in geo-data science. By developing the <em>pyClusterWise</em> Python package and using illustrative examples, the importance of tailoring these adjustments to the data and selecting clustering techniques and evaluation metrics based on their associated uncertainties is demonstrated. By doing so, the aim is to reduce overall uncertainty and improve decision-making outcomes in mineral exploration.</div></div>","PeriodicalId":19644,"journal":{"name":"Ore Geology Reviews","volume":"181 ","pages":"Article 106591"},"PeriodicalIF":3.2000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ore Geology Reviews","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169136825001519","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Clustering is a fundamental technique in unsupervised learning that groups data points based on their similarities, enabling the discovery of underlying structures and patterns without the need for labeled examples. In geo-data science, clustering plays a pivotal role in applications such as exploratory data analysis, geochemical anomaly detection, and dimensionality reduction for multivariate data analysis. Mineral exploration, in particular, involves various uncertainties that significantly affect decision-making processes. A key source of such uncertainties arises from the choice of clustering methods and their evaluation metrics. Among the widely used clustering methods, K-Means, K-Medoids, Silhouette and Hierarchical Clustering are prominent, with K-Means being a more popular choice in geosciences. Similarly, evaluation techniques such as Silhouette, Davies-Bouldin, Calinski-Harabasz, Elbow, and Bayesian Information Criterion are employed, with the Elbow method being a frequent favorite in geo-data science. However, questions remain regarding the efficiency and suitability of these methods in different contexts. Should we rely solely on K-Means and Elbow, or should we adopt a more case-specific approach, comparing uncertainties and selecting methods that minimize them? This research provides a critical review of clustering methods and evaluation metrics in geo-data science. By developing the pyClusterWise Python package and using illustrative examples, the importance of tailoring these adjustments to the data and selecting clustering techniques and evaluation metrics based on their associated uncertainties is demonstrated. By doing so, the aim is to reduce overall uncertainty and improve decision-making outcomes in mineral exploration.
期刊介绍:
Ore Geology Reviews aims to familiarize all earth scientists with recent advances in a number of interconnected disciplines related to the study of, and search for, ore deposits. The reviews range from brief to longer contributions, but the journal preferentially publishes manuscripts that fill the niche between the commonly shorter journal articles and the comprehensive book coverages, and thus has a special appeal to many authors and readers.