Federico Maria Quetti, Silvia Figini, Elena ballante
{"title":"A Bayesian Approach to Clustering via the Proper Bayesian Bootstrap: the Bayesian Bagged Clustering (BBC) algorithm","authors":"Federico Maria Quetti, Silvia Figini, Elena ballante","doi":"arxiv-2409.08954","DOIUrl":null,"url":null,"abstract":"The paper presents a novel approach for unsupervised techniques in the field\nof clustering. A new method is proposed to enhance existing literature models\nusing the proper Bayesian bootstrap to improve results in terms of robustness\nand interpretability. Our approach is organized in two steps: k-means\nclustering is used for prior elicitation, then proper Bayesian bootstrap is\napplied as resampling method in an ensemble clustering approach. Results are\nanalyzed introducing measures of uncertainty based on Shannon entropy. The\nproposal provides clear indication on the optimal number of clusters, as well\nas a better representation of the clustered data. Empirical results are\nprovided on simulated data showing the methodological and empirical advances\nobtained.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"75 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08954","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The paper presents a novel approach for unsupervised techniques in the field
of clustering. A new method is proposed to enhance existing literature models
using the proper Bayesian bootstrap to improve results in terms of robustness
and interpretability. Our approach is organized in two steps: k-means
clustering is used for prior elicitation, then proper Bayesian bootstrap is
applied as resampling method in an ensemble clustering approach. Results are
analyzed introducing measures of uncertainty based on Shannon entropy. The
proposal provides clear indication on the optimal number of clusters, as well
as a better representation of the clustered data. Empirical results are
provided on simulated data showing the methodological and empirical advances
obtained.