A cross-modal Siamese representation learning network for point cloud understanding

IF 4 3区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Computers & Electrical Engineering Pub Date : 2025-05-26 DOI:10.1016/j.compeleceng.2025.110426

Fei Wang , Jia Wu , Rui Ma , Yisha Liu , Zengshuai Qiu

{"title":"A cross-modal Siamese representation learning network for point cloud understanding","authors":"Fei Wang , Jia Wu , Rui Ma , Yisha Liu , Zengshuai Qiu","doi":"10.1016/j.compeleceng.2025.110426","DOIUrl":null,"url":null,"abstract":"<div><div>Learning effective representations from unannotated point cloud data is a challenging task in self-supervised learning. Recently, methods that use point clouds and images for cross-modal learning have achieved impressive performance. However, these methods still have some shortcomings in exploring the latent information between these two modalities. To address this issue, we propose a cross-modal Siamese representation learning network called CrossSiamese. This network uses point clouds and their rendered images for cross-modal contrastive learning. We introduce an intra-modal prediction mechanism in the network to capture the internal information in the point cloud and image modalities. In addition, we introduce a cross-modal cross-prediction mechanism to capture mutual information between the two modalities. Experimental results show that our method improves the accuracy of linear classification for 3D objects by 0.4% on ModelNet40 and 1.7% on ScanObjectNN compared to existing baseline methods. Additionally, experiments on few-shot object classification and 3D object part segmentation further validate the effectiveness of our method. These results indicate that the representations learned by our method have generalization ability and can be effectively transferred to these three downstream tasks.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"126 ","pages":"Article 110426"},"PeriodicalIF":4.0000,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Electrical Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045790625003696","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Learning effective representations from unannotated point cloud data is a challenging task in self-supervised learning. Recently, methods that use point clouds and images for cross-modal learning have achieved impressive performance. However, these methods still have some shortcomings in exploring the latent information between these two modalities. To address this issue, we propose a cross-modal Siamese representation learning network called CrossSiamese. This network uses point clouds and their rendered images for cross-modal contrastive learning. We introduce an intra-modal prediction mechanism in the network to capture the internal information in the point cloud and image modalities. In addition, we introduce a cross-modal cross-prediction mechanism to capture mutual information between the two modalities. Experimental results show that our method improves the accuracy of linear classification for 3D objects by 0.4% on ModelNet40 and 1.7% on ScanObjectNN compared to existing baseline methods. Additionally, experiments on few-shot object classification and 3D object part segmentation further validate the effectiveness of our method. These results indicate that the representations learned by our method have generalization ability and can be effectively transferred to these three downstream tasks.

查看原文本刊更多论文

一个用于点云理解的跨模态暹罗表示学习网络

在自监督学习中，从未标注的点云数据中学习有效表示是一项具有挑战性的任务。近年来，利用点云和图像进行跨模态学习的方法取得了令人瞩目的成绩。然而，这些方法在探索两种模式之间的潜在信息方面仍存在一些不足。为了解决这个问题，我们提出了一个称为CrossSiamese的跨模态连体表示学习网络。该网络使用点云和它们的渲染图像进行跨模态对比学习。我们在网络中引入了一种模态内预测机制来捕获点云和图像模态中的内部信息。此外，我们引入了一个跨模态交叉预测机制来捕获两个模态之间的相互信息。实验结果表明，与现有的基线方法相比，我们的方法在ModelNet40和ScanObjectNN上分别提高了0.4%和1.7%的3D物体线性分类准确率。此外，在少镜头目标分类和三维目标部分分割方面的实验进一步验证了该方法的有效性。这些结果表明，我们的方法学习到的表征具有泛化能力，可以有效地转移到这三个下游任务中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers & Electrical Engineering 工程技术-工程：电子与电气

CiteScore

9.20

自引率

7.00%

发文量

661

审稿时长

47 days

期刊介绍： The impact of computers has nowhere been more revolutionary than in electrical engineering. The design, analysis, and operation of electrical and electronic systems are now dominated by computers, a transformation that has been motivated by the natural ease of interface between computers and electrical systems, and the promise of spectacular improvements in speed and efficiency. Published since 1973, Computers & Electrical Engineering provides rapid publication of topical research into the integration of computer technology and computational techniques with electrical and electronic systems. The journal publishes papers featuring novel implementations of computers and computational techniques in areas like signal and image processing, high-performance computing, parallel processing, and communications. Special attention will be paid to papers describing innovative architectures, algorithms, and software tools.