Unifying complete and incomplete multi-view clustering through an information-theoretic generative model

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2024-11-22 DOI:10.1016/j.neunet.2024.106901

Yanghang Zheng , Guoxu Zhou , Haonan Huang , Xintao Luo , Zhenhao Huang , Qibin Zhao

{"title":"Unifying complete and incomplete multi-view clustering through an information-theoretic generative model","authors":"Yanghang Zheng , Guoxu Zhou , Haonan Huang , Xintao Luo , Zhenhao Huang , Qibin Zhao","doi":"10.1016/j.neunet.2024.106901","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, Incomplete Multi-View Clustering (IMVC) has become a rapidly growing research topic, driven by the prevalent issue of incomplete data in real-world applications. Although many approaches have been proposed to address this challenge, most methods did not provide a clear explanation of the learning process for recovery. Moreover, most of them only considered the inter-view relationships, without taking into account the relationships between samples. The influence of irrelevant information is usually ignored, which has prevented them from achieving optimal performance. To tackle the aforementioned issues, we aim at unifying compLete and incOmplete multi-view clusterinG through an Information-theoretiC generative model (LOGIC). Specifically, we have defined three principles based on information theory: comprehensiveness, consensus, and compressibility. We first explain that the essence of learning to recover missing views is to maximize the mutual information between the common representation and the data from each view. Secondly, we leverage the consensus principle to maximize the mutual information between view distributions to uncover the associations between different samples. Finally, guided by the principle of compressibility, we remove as much task-irrelevant information as possible to ensure that the common representation effectively extracts semantic information. Furthermore, it can serve as a plug-and-play missing-data recovery module for multi-view clustering models. Through extensive empirical studies, we have demonstrated the effectiveness of our approach in generating missing views. In clustering tasks, our method consistently outperforms state-of-the-art (SOTA) techniques in terms of accuracy, normalized mutual information and purity, showcasing its superiority in both recovery and clustering performance.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"182 ","pages":"Article 106901"},"PeriodicalIF":6.0000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S089360802400830X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Recently, Incomplete Multi-View Clustering (IMVC) has become a rapidly growing research topic, driven by the prevalent issue of incomplete data in real-world applications. Although many approaches have been proposed to address this challenge, most methods did not provide a clear explanation of the learning process for recovery. Moreover, most of them only considered the inter-view relationships, without taking into account the relationships between samples. The influence of irrelevant information is usually ignored, which has prevented them from achieving optimal performance. To tackle the aforementioned issues, we aim at unifying compLete and incOmplete multi-view clusterinG through an Information-theoretiC generative model (LOGIC). Specifically, we have defined three principles based on information theory: comprehensiveness, consensus, and compressibility. We first explain that the essence of learning to recover missing views is to maximize the mutual information between the common representation and the data from each view. Secondly, we leverage the consensus principle to maximize the mutual information between view distributions to uncover the associations between different samples. Finally, guided by the principle of compressibility, we remove as much task-irrelevant information as possible to ensure that the common representation effectively extracts semantic information. Furthermore, it can serve as a plug-and-play missing-data recovery module for multi-view clustering models. Through extensive empirical studies, we have demonstrated the effectiveness of our approach in generating missing views. In clustering tasks, our method consistently outperforms state-of-the-art (SOTA) techniques in terms of accuracy, normalized mutual information and purity, showcasing its superiority in both recovery and clustering performance.

查看原文本刊更多论文

通过信息论生成模型统一完整和不完整多视角聚类

最近，不完整多视图聚类（IMVC）已成为一个快速增长的研究课题，其驱动力是现实世界应用中普遍存在的不完整数据问题。虽然已经有很多方法被提出来应对这一挑战，但大多数方法都没有对恢复的学习过程做出清晰的解释。此外，大多数方法只考虑了视图之间的关系，而没有考虑样本之间的关系。不相关信息的影响通常也被忽略，这使得它们无法实现最佳性能。针对上述问题，我们的目标是通过信息论生成模型（LOGIC）统一完整的多视图聚类。具体来说，我们基于信息论定义了三个原则：全面性、共识性和可压缩性。首先，我们解释了学习恢复缺失视图的本质是最大化共同表征与各视图数据之间的互信息。其次，我们利用共识原则最大化视图分布之间的互信息，从而发现不同样本之间的关联。最后，在可压缩性原则的指导下，我们尽可能去除与任务无关的信息，以确保共同表征能有效提取语义信息。此外，它还可以作为多视图聚类模型的即插即用缺失数据恢复模块。通过广泛的实证研究，我们证明了我们的方法在生成缺失视图方面的有效性。在聚类任务中，我们的方法在准确度、归一化互信息和纯度方面始终优于最先进的（SOTA）技术，从而展示了其在恢复和聚类性能方面的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.