非参数聚类引导的跨视图对比学习，用于部分视图对齐表征学习

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2024-10-21 DOI:10.1109/TIP.2024.3480701

Shengsheng Qian;Dizhan Xue;Jun Hu;Huaiwen Zhang;Changsheng Xu

{"title":"非参数聚类引导的跨视图对比学习，用于部分视图对齐表征学习","authors":"Shengsheng Qian;Dizhan Xue;Jun Hu;Huaiwen Zhang;Changsheng Xu","doi":"10.1109/TIP.2024.3480701","DOIUrl":null,"url":null,"abstract":"With the increasing availability of multi-view data, multi-view representation learning has emerged as a prominent research area. However, collecting strictly view-aligned data is usually expensive, and learning from both aligned and unaligned data can be more practicable. Therefore, Partially View-aligned Representation Learning (PVRL) has recently attracted increasing attention. After aligning multi-view representations based on their semantic similarity, the aligned representations can be utilized to facilitate downstream tasks, such as clustering. However, existing methods may be constrained by the following limitations: 1) They learn semantic relations across views using the known correspondences, which is incomplete and the existence of false negative pairs (FNP) can significantly impact the learning effectiveness; 2) Existing strategies for alleviating the impact of FNP are too intuitive and lack a theoretical explanation of their applicable conditions; 3) They attempt to find FNP based on distance in the common space and fail to explore semantic relations between multi-view data. In this paper, we propose a Nonparametric Clustering-guided Cross-view Contrastive Learning (NC3L) for PVRL, in order to address the above issues. Firstly, we propose to estimate the similarity matrix between multi-view data in the marginal cross-view contrastive loss to approximate the similarity matrix of supervised contrastive learning (CL). Secondly, we establish the theoretical foundation for our proposed method by analyzing the error bounds of the loss function and its derivatives between our method and supervised CL. Thirdly, we propose a Deep Variational Nonparametric Clustering (DeepVNC) by designing a deep reparameterized variational inference for Dirichlet process Gaussian mixture models to construct cluster-level similarity between multi-view data and discover FNP. Additionally, we propose a reparameterization trick to improve the robustness and the performance of our proposed CL method. Extensive experiments on four widely used benchmark datasets show the superiority of our proposed method compared with state-of-the-art methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6158-6172"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Nonparametric Clustering-Guided Cross-View Contrastive Learning for Partially View-Aligned Representation Learning\",\"authors\":\"Shengsheng Qian;Dizhan Xue;Jun Hu;Huaiwen Zhang;Changsheng Xu\",\"doi\":\"10.1109/TIP.2024.3480701\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the increasing availability of multi-view data, multi-view representation learning has emerged as a prominent research area. However, collecting strictly view-aligned data is usually expensive, and learning from both aligned and unaligned data can be more practicable. Therefore, Partially View-aligned Representation Learning (PVRL) has recently attracted increasing attention. After aligning multi-view representations based on their semantic similarity, the aligned representations can be utilized to facilitate downstream tasks, such as clustering. However, existing methods may be constrained by the following limitations: 1) They learn semantic relations across views using the known correspondences, which is incomplete and the existence of false negative pairs (FNP) can significantly impact the learning effectiveness; 2) Existing strategies for alleviating the impact of FNP are too intuitive and lack a theoretical explanation of their applicable conditions; 3) They attempt to find FNP based on distance in the common space and fail to explore semantic relations between multi-view data. In this paper, we propose a Nonparametric Clustering-guided Cross-view Contrastive Learning (NC3L) for PVRL, in order to address the above issues. Firstly, we propose to estimate the similarity matrix between multi-view data in the marginal cross-view contrastive loss to approximate the similarity matrix of supervised contrastive learning (CL). Secondly, we establish the theoretical foundation for our proposed method by analyzing the error bounds of the loss function and its derivatives between our method and supervised CL. Thirdly, we propose a Deep Variational Nonparametric Clustering (DeepVNC) by designing a deep reparameterized variational inference for Dirichlet process Gaussian mixture models to construct cluster-level similarity between multi-view data and discover FNP. Additionally, we propose a reparameterization trick to improve the robustness and the performance of our proposed CL method. Extensive experiments on four widely used benchmark datasets show the superiority of our proposed method compared with state-of-the-art methods.\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"33 \",\"pages\":\"6158-6172\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10726686/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10726686/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着多视图数据的日益增多，多视图表示学习已成为一个突出的研究领域。然而，收集严格的视图对齐数据通常成本高昂，而从对齐和非对齐数据中学习则更为实用。因此，部分视图对齐表征学习（PVRL）最近引起了越来越多的关注。根据语义相似性对多视图表征进行配准后，配准后的表征可用于促进聚类等下游任务。然而，现有的方法可能会受到以下限制：1) 它们利用已知的对应关系来学习视图间的语义关系，而这种方法是不完整的，假负对（FNP）的存在会严重影响学习效果；2) 现有的缓解 FNP 影响的策略过于直观，缺乏对其适用条件的理论解释；3) 它们试图根据公共空间中的距离来寻找 FNP，无法探索多视图数据间的语义关系。本文针对上述问题，提出了一种用于 PVRL 的非参数聚类引导的跨视图对比学习（NC3L）。首先，我们提出在边际跨视角对比损失中估计多视角数据之间的相似性矩阵，以近似监督对比学习（CL）的相似性矩阵。其次，我们通过分析我们的方法与有监督对比学习之间损失函数及其导数的误差边界，为我们提出的方法奠定了理论基础。第三，我们提出了一种深度变异非参数聚类（DeepVNC），通过为 Dirichlet 过程高斯混合物模型设计一种深度重参数化变异推理来构建多视图数据之间的聚类相似性，并发现 FNP。此外，我们还提出了一种重参数化技巧，以提高我们提出的 CL 方法的鲁棒性和性能。在四个广泛使用的基准数据集上进行的大量实验表明，与最先进的方法相比，我们提出的方法更胜一筹。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Nonparametric Clustering-Guided Cross-View Contrastive Learning for Partially View-Aligned Representation Learning

With the increasing availability of multi-view data, multi-view representation learning has emerged as a prominent research area. However, collecting strictly view-aligned data is usually expensive, and learning from both aligned and unaligned data can be more practicable. Therefore, Partially View-aligned Representation Learning (PVRL) has recently attracted increasing attention. After aligning multi-view representations based on their semantic similarity, the aligned representations can be utilized to facilitate downstream tasks, such as clustering. However, existing methods may be constrained by the following limitations: 1) They learn semantic relations across views using the known correspondences, which is incomplete and the existence of false negative pairs (FNP) can significantly impact the learning effectiveness; 2) Existing strategies for alleviating the impact of FNP are too intuitive and lack a theoretical explanation of their applicable conditions; 3) They attempt to find FNP based on distance in the common space and fail to explore semantic relations between multi-view data. In this paper, we propose a Nonparametric Clustering-guided Cross-view Contrastive Learning (NC3L) for PVRL, in order to address the above issues. Firstly, we propose to estimate the similarity matrix between multi-view data in the marginal cross-view contrastive loss to approximate the similarity matrix of supervised contrastive learning (CL). Secondly, we establish the theoretical foundation for our proposed method by analyzing the error bounds of the loss function and its derivatives between our method and supervised CL. Thirdly, we propose a Deep Variational Nonparametric Clustering (DeepVNC) by designing a deep reparameterized variational inference for Dirichlet process Gaussian mixture models to construct cluster-level similarity between multi-view data and discover FNP. Additionally, we propose a reparameterization trick to improve the robustness and the performance of our proposed CL method. Extensive experiments on four widely used benchmark datasets show the superiority of our proposed method compared with state-of-the-art methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量