Louise Alamichel, Daria Bystrova, Julyan Arbel, Guillaume Kon Kam King
{"title":"Bayesian mixture models (in)consistency for the number of clusters","authors":"Louise Alamichel, Daria Bystrova, Julyan Arbel, Guillaume Kon Kam King","doi":"10.1111/sjos.12739","DOIUrl":null,"url":null,"abstract":"Bayesian nonparametric mixture models are common for modeling complex data. While these models are well‐suited for density estimation, recent results proved posterior inconsistency of the number of clusters when the true number of components is finite, for the Dirichlet process and Pitman–Yor process mixture models. We extend these results to additional Bayesian nonparametric priors such as Gibbs‐type processes and finite‐dimensional representations thereof. The latter include the Dirichlet multinomial process, the recently proposed Pitman–Yor, and normalized generalized gamma multinomial processes. We show that mixture models based on these processes are also inconsistent in the number of clusters and discuss possible solutions. Notably, we show that a postprocessing algorithm introduced for the Dirichlet process can be extended to more general models and provides a consistent method to estimate the number of components.","PeriodicalId":49567,"journal":{"name":"Scandinavian Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":0.8000,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scandinavian Journal of Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1111/sjos.12739","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
Abstract
Bayesian nonparametric mixture models are common for modeling complex data. While these models are well‐suited for density estimation, recent results proved posterior inconsistency of the number of clusters when the true number of components is finite, for the Dirichlet process and Pitman–Yor process mixture models. We extend these results to additional Bayesian nonparametric priors such as Gibbs‐type processes and finite‐dimensional representations thereof. The latter include the Dirichlet multinomial process, the recently proposed Pitman–Yor, and normalized generalized gamma multinomial processes. We show that mixture models based on these processes are also inconsistent in the number of clusters and discuss possible solutions. Notably, we show that a postprocessing algorithm introduced for the Dirichlet process can be extended to more general models and provides a consistent method to estimate the number of components.
期刊介绍:
The Scandinavian Journal of Statistics is internationally recognised as one of the leading statistical journals in the world. It was founded in 1974 by four Scandinavian statistical societies. Today more than eighty per cent of the manuscripts are submitted from outside Scandinavia.
It is an international journal devoted to reporting significant and innovative original contributions to statistical methodology, both theory and applications.
The journal specializes in statistical modelling showing particular appreciation of the underlying substantive research problems.
The emergence of specialized methods for analysing longitudinal and spatial data is just one example of an area of important methodological development in which the Scandinavian Journal of Statistics has a particular niche.