跨域三维模型检索的样式和语义混合

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation Pub Date : 2025-01-31 DOI:10.1016/j.jvcir.2025.104390

Xinwei Fu , Dan Song , Yue Yang , Yuyi Zhang , Bo Wang

{"title":"跨域三维模型检索的样式和语义混合","authors":"Xinwei Fu , Dan Song , Yue Yang , Yuyi Zhang , Bo Wang","doi":"10.1016/j.jvcir.2025.104390","DOIUrl":null,"url":null,"abstract":"<div><div>With the development of deep neural networks and image processing technology, cross-domain 3D model retrieval algorithms based on 2D images have attracted much attention, utilizing visual information from labeled 2D images to assist in processing unlabeled 3D models. Existing unsupervised cross-domain 3D model retrieval algorithm use domain adaptation to narrow the modality gap between 2D images and 3D models. However, these methods overlook specific style visual information between different domains of 2D images and 3D models, which is crucial for reducing the domain distribution discrepancy. To address this issue, this paper proposes a Style and Semantic Mix (S<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>Mix) network for cross-domain 3D model retrieval, which fuses style visual information and semantic consistency features between different domains. Specifically, we design a style mix module to perform on shallow feature maps that are closer to the input data, learning 2D image and 3D model features with intermediate domain mixed style to narrow the domain distribution discrepancy. In addition, in order to improve the semantic prediction accuracy of unlabeled samples, a semantic mix module is also designed to operate on deep features, fusing features from reliable unlabeled 3D model and 2D image samples with semantic consistency. Our experiments demonstrate the effectiveness of the proposed S<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>Mix on two commonly-used cross-domain 3D model retrieval datasets MI3DOR-1 and MI3DOR-2.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104390"},"PeriodicalIF":2.6000,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"S2Mix: Style and Semantic Mix for cross-domain 3D model retrieval\",\"authors\":\"Xinwei Fu , Dan Song , Yue Yang , Yuyi Zhang , Bo Wang\",\"doi\":\"10.1016/j.jvcir.2025.104390\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the development of deep neural networks and image processing technology, cross-domain 3D model retrieval algorithms based on 2D images have attracted much attention, utilizing visual information from labeled 2D images to assist in processing unlabeled 3D models. Existing unsupervised cross-domain 3D model retrieval algorithm use domain adaptation to narrow the modality gap between 2D images and 3D models. However, these methods overlook specific style visual information between different domains of 2D images and 3D models, which is crucial for reducing the domain distribution discrepancy. To address this issue, this paper proposes a Style and Semantic Mix (S<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>Mix) network for cross-domain 3D model retrieval, which fuses style visual information and semantic consistency features between different domains. Specifically, we design a style mix module to perform on shallow feature maps that are closer to the input data, learning 2D image and 3D model features with intermediate domain mixed style to narrow the domain distribution discrepancy. In addition, in order to improve the semantic prediction accuracy of unlabeled samples, a semantic mix module is also designed to operate on deep features, fusing features from reliable unlabeled 3D model and 2D image samples with semantic consistency. Our experiments demonstrate the effectiveness of the proposed S<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>Mix on two commonly-used cross-domain 3D model retrieval datasets MI3DOR-1 and MI3DOR-2.</div></div>\",\"PeriodicalId\":54755,\"journal\":{\"name\":\"Journal of Visual Communication and Image Representation\",\"volume\":\"107 \",\"pages\":\"Article 104390\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-01-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Visual Communication and Image Representation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1047320325000045\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320325000045","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

随着深度神经网络和图像处理技术的发展，基于二维图像的跨域三维模型检索算法备受关注，该算法利用已标记的二维图像中的视觉信息来辅助处理未标记的三维模型。现有的无监督跨域三维模型检索算法利用域自适应来缩小二维图像与三维模型之间的模态差距。然而，这些方法忽略了2D图像和3D模型不同域之间的特定风格视觉信息，这对于减少域分布差异至关重要。为了解决这一问题，本文提出了一种风格和语义混合（S2Mix）网络，该网络融合了不同领域之间的风格视觉信息和语义一致性特征，用于跨领域的3D模型检索。具体来说，我们设计了一个风格混合模块，在更接近输入数据的浅层特征图上执行，以中间域混合风格学习2D图像和3D模型特征，以缩小域分布差异。此外，为了提高未标记样本的语义预测精度，还设计了语义混合模块，对深度特征进行操作，将可靠的未标记3D模型和具有语义一致性的2D图像样本的特征融合在一起。我们的实验证明了所提出的S2Mix在两个常用的跨域3D模型检索数据集MI3DOR-1和MI3DOR-2上的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

S2Mix: Style and Semantic Mix for cross-domain 3D model retrieval

With the development of deep neural networks and image processing technology, cross-domain 3D model retrieval algorithms based on 2D images have attracted much attention, utilizing visual information from labeled 2D images to assist in processing unlabeled 3D models. Existing unsupervised cross-domain 3D model retrieval algorithm use domain adaptation to narrow the modality gap between 2D images and 3D models. However, these methods overlook specific style visual information between different domains of 2D images and 3D models, which is crucial for reducing the domain distribution discrepancy. To address this issue, this paper proposes a Style and Semantic Mix (S

^{2}

Mix) network for cross-domain 3D model retrieval, which fuses style visual information and semantic consistency features between different domains. Specifically, we design a style mix module to perform on shallow feature maps that are closer to the input data, learning 2D image and 3D model features with intermediate domain mixed style to narrow the domain distribution discrepancy. In addition, in order to improve the semantic prediction accuracy of unlabeled samples, a semantic mix module is also designed to operate on deep features, fusing features from reliable unlabeled 3D model and 2D image samples with semantic consistency. Our experiments demonstrate the effectiveness of the proposed S

^{2}

Mix on two commonly-used cross-domain 3D model retrieval datasets MI3DOR-1 and MI3DOR-2.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Visual Communication and Image Representation 工程技术-计算机：软件工程

CiteScore

5.40

自引率

11.50%

发文量

188

审稿时长

9.9 months

期刊介绍： The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.