{"title":"Debiasing weighted multi-view k-means clustering based on causal regularization","authors":"Xiuqi Huang, Hong Tao, Haotian Ni, Chenping Hou","doi":"10.1016/j.patcog.2024.111195","DOIUrl":null,"url":null,"abstract":"<div><div>In the field of unsupervised learning, many methods such as clustering rely on exploring the correlations among features. However, considering these correlations is not always advantageous for learning models. The biased selection of data may lead to redundant and unstable correlations among features, adversely affecting the performance of learning models. Multi-view data presents more complex feature correlations with potential redundancy and varying distributions across views, necessitating detailed analysis. This paper proposes a causal regularized debiased multi-view k-means clustering (DMKC) method to counteract redundant feature correlations stemming from sample selection bias. This method introduces a covariate weighted balance method from causal inference to mitigate redundant bias in multi-view clustering by adjusting sample weights. The approach combines sample and view weights within a k-means loss framework, effectively eliminating feature redundancy and enhancing clustering performance amidst sample selection bias. The optimization process of the relevant parameters is detailed in this paper, and comprehensive experiments demonstrate the effectiveness of the method.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111195"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320324009464","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In the field of unsupervised learning, many methods such as clustering rely on exploring the correlations among features. However, considering these correlations is not always advantageous for learning models. The biased selection of data may lead to redundant and unstable correlations among features, adversely affecting the performance of learning models. Multi-view data presents more complex feature correlations with potential redundancy and varying distributions across views, necessitating detailed analysis. This paper proposes a causal regularized debiased multi-view k-means clustering (DMKC) method to counteract redundant feature correlations stemming from sample selection bias. This method introduces a covariate weighted balance method from causal inference to mitigate redundant bias in multi-view clustering by adjusting sample weights. The approach combines sample and view weights within a k-means loss framework, effectively eliminating feature redundancy and enhancing clustering performance amidst sample selection bias. The optimization process of the relevant parameters is detailed in this paper, and comprehensive experiments demonstrate the effectiveness of the method.
在无监督学习领域,聚类等许多方法都依赖于探索特征之间的相关性。然而,考虑这些相关性并不总是对学习模型有利。数据选择的偏差可能会导致特征间冗余和不稳定的相关性,从而对学习模型的性能产生不利影响。多视图数据具有更复杂的特征相关性,可能存在冗余,而且不同视图之间的分布各不相同,因此有必要进行详细分析。本文提出了一种因果正则化去偏多视图 K 均值聚类(DMKC)方法,以抵消因样本选择偏差而产生的冗余特征相关性。该方法从因果推理中引入了一种协变量加权平衡方法,通过调整样本权重来减轻多视图聚类中的冗余偏差。该方法将样本权重和视图权重结合到 k-means 损失框架中,有效消除了特征冗余,并在样本选择偏差中提高了聚类性能。本文详细介绍了相关参数的优化过程,并通过综合实验证明了该方法的有效性。
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.