Multi-view contrastive learning for unsupervised 3D model retrieval and classification

IF 2.7 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication Pub Date : 2025-05-06 DOI:10.1016/j.image.2025.117333

Wenhui Li , Zhenghao Fang , Dan Song , Weizhi Nie , Xuanya Li , An-An Liu

{"title":"Multi-view contrastive learning for unsupervised 3D model retrieval and classification","authors":"Wenhui Li , Zhenghao Fang , Dan Song , Weizhi Nie , Xuanya Li , An-An Liu","doi":"10.1016/j.image.2025.117333","DOIUrl":null,"url":null,"abstract":"<div><div>Unsupervised 3D model retrieval and classification have attracted a lot of attention due to wide applications. Although much progress has been achieved, they remain challenging due to the lack of supervised information to optimize neural network learning. Existing unsupervised methods usually utilized clustering algorithms to generate pseudo labels for 3D models. However, the clustering algorithms cannot fully mine the multi-view structure information and misguide the unsupervised learning process due to the noise information. To cope with the above limitation, this paper proposes a Multi-View Contrastive Learning (MVCL) method, which fully takes advantage of multi-view structure information to optimize the neural network. Specifically, we propose a multi-view grouping scheme and multi-view contrastive learning scheme to mine the self-supervised information and learn discriminative feature representation. The multi-view grouping scheme divides the multiple views of each 3D model into two groups and minimizes the group-level difference, which facilitates exploring the internal characteristics of 3D structural information. To learn the relationships among multiple views in an unsupervised manner, we propose a two-stream asymmetrical framework including the main network and the subsidiary network to guarantee the discrimination of the learned feature. Extensive 3D model retrieval and classification experiments are conducted on two challenging datasets, demonstrating the superiority of this method.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117333"},"PeriodicalIF":2.7000,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal Processing-Image Communication","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0923596525000803","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Unsupervised 3D model retrieval and classification have attracted a lot of attention due to wide applications. Although much progress has been achieved, they remain challenging due to the lack of supervised information to optimize neural network learning. Existing unsupervised methods usually utilized clustering algorithms to generate pseudo labels for 3D models. However, the clustering algorithms cannot fully mine the multi-view structure information and misguide the unsupervised learning process due to the noise information. To cope with the above limitation, this paper proposes a Multi-View Contrastive Learning (MVCL) method, which fully takes advantage of multi-view structure information to optimize the neural network. Specifically, we propose a multi-view grouping scheme and multi-view contrastive learning scheme to mine the self-supervised information and learn discriminative feature representation. The multi-view grouping scheme divides the multiple views of each 3D model into two groups and minimizes the group-level difference, which facilitates exploring the internal characteristics of 3D structural information. To learn the relationships among multiple views in an unsupervised manner, we propose a two-stream asymmetrical framework including the main network and the subsidiary network to guarantee the discrimination of the learned feature. Extensive 3D model retrieval and classification experiments are conducted on two challenging datasets, demonstrating the superiority of this method.

查看原文本刊更多论文

无监督三维模型检索与分类的多视图对比学习

无监督三维模型检索与分类由于其广泛的应用而受到了广泛的关注。尽管已经取得了很大的进展，但由于缺乏监督信息来优化神经网络学习，它们仍然具有挑战性。现有的无监督方法通常采用聚类算法对三维模型生成伪标签。然而，聚类算法不能充分挖掘多视图结构信息，并且由于噪声信息会对无监督学习过程产生误导。针对上述局限性，本文提出了一种多视图对比学习（MVCL）方法，充分利用多视图结构信息对神经网络进行优化。具体来说，我们提出了一种多视图分组方案和多视图对比学习方案来挖掘自监督信息和学习判别特征表示。多视图分组方案将每个3D模型的多个视图分成两组，最大限度地减少组级差异，便于探索3D结构信息的内部特征。为了以无监督的方式学习多个视图之间的关系，我们提出了一个包括主网络和副网络的两流非对称框架，以保证学习特征的识别。在两个具有挑战性的数据集上进行了大量的三维模型检索和分类实验，证明了该方法的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Signal Processing-Image Communication 工程技术-工程：电子与电气

CiteScore

8.40

自引率

2.90%

发文量

138

审稿时长

5.2 months

期刊介绍： Signal Processing: Image Communication is an international journal for the development of the theory and practice of image communication. Its primary objectives are the following: To present a forum for the advancement of theory and practice of image communication. To stimulate cross-fertilization between areas similar in nature which have traditionally been separated, for example, various aspects of visual communications and information systems. To contribute to a rapid information exchange between the industrial and academic environments. The editorial policy and the technical content of the journal are the responsibility of the Editor-in-Chief, the Area Editors and the Advisory Editors. The Journal is self-supporting from subscription income and contains a minimum amount of advertisements. Advertisements are subject to the prior approval of the Editor-in-Chief. The journal welcomes contributions from every country in the world. Signal Processing: Image Communication publishes articles relating to aspects of the design, implementation and use of image communication systems. The journal features original research work, tutorial and review articles, and accounts of practical developments. Subjects of interest include image/video coding, 3D video representations and compression, 3D graphics and animation compression, HDTV and 3DTV systems, video adaptation, video over IP, peer-to-peer video networking, interactive visual communication, multi-user video conferencing, wireless video broadcasting and communication, visual surveillance, 2D and 3D image/video quality measures, pre/post processing, video restoration and super-resolution, multi-camera video analysis, motion analysis, content-based image/video indexing and retrieval, face and gesture processing, video synthesis, 2D and 3D image/video acquisition and display technologies, architectures for image/video processing and communication.