View sequence prediction GAN: unsupervised representation learning for 3D shapes by decomposing view content and viewpoint variance

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems Pub Date : 2024-08-01 DOI:10.1007/s00530-024-01431-8

Heyu Zhou, Jiayu Li, Xianzhu Liu, Yingda Lyu, Haipeng Chen, An-An Liu

{"title":"View sequence prediction GAN: unsupervised representation learning for 3D shapes by decomposing view content and viewpoint variance","authors":"Heyu Zhou, Jiayu Li, Xianzhu Liu, Yingda Lyu, Haipeng Chen, An-An Liu","doi":"10.1007/s00530-024-01431-8","DOIUrl":null,"url":null,"abstract":"<p>Unsupervised representation learning for 3D shapes has become a critical problem for large-scale 3D shape management. Recent model-based methods for this task require additional information for training, while popular view-based methods often overlook viewpoint variance in view prediction, leading to uninformative 3D features that limit their practical applications. To address these issues, we propose an unsupervised 3D shape representation learning method called View Sequence Prediction GAN (VSP-GAN), which decomposes view content and viewpoint variance. VSP-GAN takes several adjacent views of a 3D shape as input and outputs the subsequent views. The key idea is to split the multi-view sequence into two available perceptible parts, view content and viewpoint variance, and independently encode them with separate encoders. With the information, we design a decoder implemented by the mirrored architecture of the content encoder to predict the view sequence by multi-steps. Besides, to improve the quality of the reconstructed views, we propose a novel hierarchical view prediction loss to enhance view realism, semantic consistency, and details retainment. We evaluate the proposed VSP-GAN on two popular 3D CAD datasets, ModelNet10 and ModelNet40, for 3D shape classification and retrieval. The experimental results demonstrate that our VSP-GAN can learn more discriminative features than the state-of-the-art methods.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"45 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00530-024-01431-8","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Unsupervised representation learning for 3D shapes has become a critical problem for large-scale 3D shape management. Recent model-based methods for this task require additional information for training, while popular view-based methods often overlook viewpoint variance in view prediction, leading to uninformative 3D features that limit their practical applications. To address these issues, we propose an unsupervised 3D shape representation learning method called View Sequence Prediction GAN (VSP-GAN), which decomposes view content and viewpoint variance. VSP-GAN takes several adjacent views of a 3D shape as input and outputs the subsequent views. The key idea is to split the multi-view sequence into two available perceptible parts, view content and viewpoint variance, and independently encode them with separate encoders. With the information, we design a decoder implemented by the mirrored architecture of the content encoder to predict the view sequence by multi-steps. Besides, to improve the quality of the reconstructed views, we propose a novel hierarchical view prediction loss to enhance view realism, semantic consistency, and details retainment. We evaluate the proposed VSP-GAN on two popular 3D CAD datasets, ModelNet10 and ModelNet40, for 3D shape classification and retrieval. The experimental results demonstrate that our VSP-GAN can learn more discriminative features than the state-of-the-art methods.

Abstract Image

查看原文本刊更多论文

视图序列预测 GAN：通过分解视图内容和视点差异对三维形状进行无监督表示学习

三维形状的无监督表示学习已成为大规模三维形状管理的关键问题。最近的基于模型的方法需要额外的信息进行训练，而流行的基于视图的方法在视图预测中往往忽略了视点差异，导致三维特征信息量不足，限制了其实际应用。为了解决这些问题，我们提出了一种名为视图序列预测 GAN（VSP-GAN）的无监督三维形状表示学习方法，它能分解视图内容和视点差异。VSP-GAN 将三维形状的多个相邻视图作为输入，并输出后续视图。其主要思路是将多视图序列拆分成两个可感知的部分，即视图内容和视点差异，并分别用不同的编码器对其进行编码。利用这些信息，我们设计了一个解码器，通过内容编码器的镜像架构来实现多步骤预测视图序列。此外，为了提高重建视图的质量，我们提出了一种新颖的分层视图预测损失，以增强视图的真实性、语义一致性和细节保留。我们在两个流行的三维 CAD 数据集 ModelNet10 和 ModelNet40 上对所提出的 VSP-GAN 进行了评估，以进行三维形状分类和检索。实验结果表明，与最先进的方法相比，我们的 VSP-GAN 可以学习到更多的判别特征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Multimedia Systems 工程技术-计算机：理论方法

CiteScore

5.40

自引率

7.70%

发文量

148

审稿时长

4.5 months

期刊介绍： This journal details innovative research ideas, emerging technologies, state-of-the-art methods and tools in all aspects of multimedia computing, communication, storage, and applications. It features theoretical, experimental, and survey articles.