{"title":"Product Centred Dirichlet Processes for Bayesian Multiview Clustering.","authors":"Alexander Dombowsky, David B Dunson","doi":"10.1093/jrsssb/qkaf021","DOIUrl":null,"url":null,"abstract":"<p><p>While there is an immense literature on Bayesian methods for clustering, the multiview case has received little attention. This problem focuses on obtaining distinct but statistically dependent clusterings in a common set of entities for different data types. For example, clustering patients into subgroups with subgroup membership varying according to the domain of the patient variables. A challenge is how to model the across-view dependence between the partitions of patients into subgroups. The complexities of the partition space make standard methods to model dependence, such as correlation, infeasible. In this article, we propose CLustering with Independence Centring (CLIC), a clustering prior that uses a single parameter to explicitly model dependence between clusterings across views. CLIC is induced by the product centred Dirichlet process (PCDP), a novel hierarchical prior that bridges between independent and equivalent partitions. We show appealing theoretic properties, provide a finite approximation and prove its accuracy, present a marginal Gibbs sampler for posterior computation, and derive closed form expressions for the marginal and joint partition distributions for the CLIC model. On synthetic data and in an application to epidemiology, CLIC accurately characterizes view-specific partitions while providing inference on the dependence level.</p>","PeriodicalId":49982,"journal":{"name":"Journal of the Royal Statistical Society Series B-Statistical Methodology","volume":" ","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12392789/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Royal Statistical Society Series B-Statistical Methodology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/jrsssb/qkaf021","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
Abstract
While there is an immense literature on Bayesian methods for clustering, the multiview case has received little attention. This problem focuses on obtaining distinct but statistically dependent clusterings in a common set of entities for different data types. For example, clustering patients into subgroups with subgroup membership varying according to the domain of the patient variables. A challenge is how to model the across-view dependence between the partitions of patients into subgroups. The complexities of the partition space make standard methods to model dependence, such as correlation, infeasible. In this article, we propose CLustering with Independence Centring (CLIC), a clustering prior that uses a single parameter to explicitly model dependence between clusterings across views. CLIC is induced by the product centred Dirichlet process (PCDP), a novel hierarchical prior that bridges between independent and equivalent partitions. We show appealing theoretic properties, provide a finite approximation and prove its accuracy, present a marginal Gibbs sampler for posterior computation, and derive closed form expressions for the marginal and joint partition distributions for the CLIC model. On synthetic data and in an application to epidemiology, CLIC accurately characterizes view-specific partitions while providing inference on the dependence level.
期刊介绍:
Series B (Statistical Methodology) aims to publish high quality papers on the methodological aspects of statistics and data science more broadly. The objective of papers should be to contribute to the understanding of statistical methodology and/or to develop and improve statistical methods; any mathematical theory should be directed towards these aims. The kinds of contribution considered include descriptions of new methods of collecting or analysing data, with the underlying theory, an indication of the scope of application and preferably a real example. Also considered are comparisons, critical evaluations and new applications of existing methods, contributions to probability theory which have a clear practical bearing (including the formulation and analysis of stochastic models), statistical computation or simulation where original methodology is involved and original contributions to the foundations of statistical science. Reviews of methodological techniques are also considered. A paper, even if correct and well presented, is likely to be rejected if it only presents straightforward special cases of previously published work, if it is of mathematical interest only, if it is too long in relation to the importance of the new material that it contains or if it is dominated by computations or simulations of a routine nature.