基于超图的多模态融合网络三维形状检索

2022 IEEE 22nd International Conference on Communication Technology (ICCT) Pub Date : 2022-11-11 DOI:10.1109/ICCT56141.2022.10072638

Xiaoting Huang, Liping Nong, Wenhui Zhang

{"title":"基于超图的多模态融合网络三维形状检索","authors":"Xiaoting Huang, Liping Nong, Wenhui Zhang","doi":"10.1109/ICCT56141.2022.10072638","DOIUrl":null,"url":null,"abstract":"3D shape retrieval is an important research topic in the field of modern multimedia information retrieval. Point cloud and mesh modalities are commonly used representations of 3D data and have strong shape description capabilities. However, existing multimodal 3D shape retrieval methods lack the fusion learning of these two irregular data. In this paper, we design a depthwise separable hypergraph convolution and build a multimodal fusion network base on it, which use hypergraph to model higher-order relationships between data and improve 3D shape retrieval capabily through the effective fusion of point cloud and mesh data. First, the initial feature descriptors of the point cloud and mesh modalities are extracted using a pretrained network, respectively. Next, perform channel shuffle on the initial feature descriptors to mix the multimadal data and then use the k-Nearest Neighbour(kNN) algorithm to construct corresponding hypergraphs. Finally, depthwise separable hypergraph convolution is proposed to extract discriminative shape representations and fuse the multimodal information. During the process of network training, the fusion network is jointly constrained by the mean square error loss function and the cross entropy loss function. The proposed network is applied to the 3D shape retrieval task, and the experimental results demonstrate that the proposed method can greatly improve the retrieval accuracy.","PeriodicalId":294057,"journal":{"name":"2022 IEEE 22nd International Conference on Communication Technology (ICCT)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Multimodal Fusion Network Based on Hypergraph for 3D Shape Retrieval\",\"authors\":\"Xiaoting Huang, Liping Nong, Wenhui Zhang\",\"doi\":\"10.1109/ICCT56141.2022.10072638\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"3D shape retrieval is an important research topic in the field of modern multimedia information retrieval. Point cloud and mesh modalities are commonly used representations of 3D data and have strong shape description capabilities. However, existing multimodal 3D shape retrieval methods lack the fusion learning of these two irregular data. In this paper, we design a depthwise separable hypergraph convolution and build a multimodal fusion network base on it, which use hypergraph to model higher-order relationships between data and improve 3D shape retrieval capabily through the effective fusion of point cloud and mesh data. First, the initial feature descriptors of the point cloud and mesh modalities are extracted using a pretrained network, respectively. Next, perform channel shuffle on the initial feature descriptors to mix the multimadal data and then use the k-Nearest Neighbour(kNN) algorithm to construct corresponding hypergraphs. Finally, depthwise separable hypergraph convolution is proposed to extract discriminative shape representations and fuse the multimodal information. During the process of network training, the fusion network is jointly constrained by the mean square error loss function and the cross entropy loss function. The proposed network is applied to the 3D shape retrieval task, and the experimental results demonstrate that the proposed method can greatly improve the retrieval accuracy.\",\"PeriodicalId\":294057,\"journal\":{\"name\":\"2022 IEEE 22nd International Conference on Communication Technology (ICCT)\",\"volume\":\"90 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 22nd International Conference on Communication Technology (ICCT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCT56141.2022.10072638\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 22nd International Conference on Communication Technology (ICCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCT56141.2022.10072638","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

三维形状检索是现代多媒体信息检索领域的一个重要研究课题。点云和网格模态是三维数据的常用表示形式，具有很强的形状描述能力。然而，现有的多模态三维形状检索方法缺乏对这两种不规则数据的融合学习。本文设计了一种深度可分离的超图卷积，并在此基础上构建了多模态融合网络，利用超图对数据之间的高阶关系进行建模，通过点云和网格数据的有效融合，提高了三维形状检索能力。首先，使用预训练的网络分别提取点云和网格模式的初始特征描述符。接下来，对初始特征描述符执行通道洗牌以混合多态数据，然后使用k-最近邻(kNN)算法构造相应的超图。最后，提出了深度可分超图卷积提取可判别形状表示并融合多模态信息的方法。在网络训练过程中，融合网络受到均方误差损失函数和交叉熵损失函数的共同约束。将该网络应用于三维形状检索任务中，实验结果表明，该方法可以大大提高检索精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Multimodal Fusion Network Based on Hypergraph for 3D Shape Retrieval

3D shape retrieval is an important research topic in the field of modern multimedia information retrieval. Point cloud and mesh modalities are commonly used representations of 3D data and have strong shape description capabilities. However, existing multimodal 3D shape retrieval methods lack the fusion learning of these two irregular data. In this paper, we design a depthwise separable hypergraph convolution and build a multimodal fusion network base on it, which use hypergraph to model higher-order relationships between data and improve 3D shape retrieval capabily through the effective fusion of point cloud and mesh data. First, the initial feature descriptors of the point cloud and mesh modalities are extracted using a pretrained network, respectively. Next, perform channel shuffle on the initial feature descriptors to mix the multimadal data and then use the k-Nearest Neighbour(kNN) algorithm to construct corresponding hypergraphs. Finally, depthwise separable hypergraph convolution is proposed to extract discriminative shape representations and fuse the multimodal information. During the process of network training, the fusion network is jointly constrained by the mean square error loss function and the cross entropy loss function. The proposed network is applied to the 3D shape retrieval task, and the experimental results demonstrate that the proposed method can greatly improve the retrieval accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 22nd International Conference on Communication Technology (ICCT)

自引率

0.00%

发文量