基于可解释学习的多模态哈希分析，用于多视图特征表示学习

2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR) Pub Date : 2022-08-01 DOI:10.1109/MIPR54900.2022.00016

Lei Gao, L. Guan

{"title":"基于可解释学习的多模态哈希分析，用于多视图特征表示学习","authors":"Lei Gao, L. Guan","doi":"10.1109/MIPR54900.2022.00016","DOIUrl":null,"url":null,"abstract":"In this work, an interpretable learning-based multi-modal hashing analysis (ILMMHA) model is proposed with appli-cation to multi-view feature representation learning. In the proposed model, a cascade network structure is first utilized to reveal the intrinsically semantic representation of input variables. Then, a multi-modal hashing (MMH) method is integrated with the explored semantic representation, gener-ating an interpretable learning-based model for multi-view feature representation. Since MMH is capable of measuring semantic similarity across multiple variables jointly, it provides a natural link between the explored intrinsically semantic representation and its similarity across multi-modal data/information. Benefiting from integration of the cascade structure and MMH, the ILMMHA model leads to a new multi-view feature representation of high quality. To demonstrate the effectiveness and generic nature of the ILMMHA model, we conduct experiments on the cross-modal based audio-visual emotion and text-image recognition tasks, respectively. Experimental results demonstrate the superiority of the proposed model on multi-view feature representation learning.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"INTERPRETABLE LEARNING-BASED MULTI-MODAL HASHING ANALYSIS FOR MULTI-VIEW FEATURE REPRESENTATION LEARNING\",\"authors\":\"Lei Gao, L. Guan\",\"doi\":\"10.1109/MIPR54900.2022.00016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work, an interpretable learning-based multi-modal hashing analysis (ILMMHA) model is proposed with appli-cation to multi-view feature representation learning. In the proposed model, a cascade network structure is first utilized to reveal the intrinsically semantic representation of input variables. Then, a multi-modal hashing (MMH) method is integrated with the explored semantic representation, gener-ating an interpretable learning-based model for multi-view feature representation. Since MMH is capable of measuring semantic similarity across multiple variables jointly, it provides a natural link between the explored intrinsically semantic representation and its similarity across multi-modal data/information. Benefiting from integration of the cascade structure and MMH, the ILMMHA model leads to a new multi-view feature representation of high quality. To demonstrate the effectiveness and generic nature of the ILMMHA model, we conduct experiments on the cross-modal based audio-visual emotion and text-image recognition tasks, respectively. Experimental results demonstrate the superiority of the proposed model on multi-view feature representation learning.\",\"PeriodicalId\":228640,\"journal\":{\"name\":\"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MIPR54900.2022.00016\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MIPR54900.2022.00016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

本文提出了一种基于可解释学习的多模态哈希分析(ILMMHA)模型，并将其应用于多视图特征表示学习。在该模型中，首先使用级联网络结构来揭示输入变量的内在语义表示。然后，将多模态哈希(MMH)方法与探索的语义表示方法相结合，生成可解释的基于学习的多视图特征表示模型。由于MMH能够联合测量多个变量之间的语义相似度，因此它在探索的内在语义表示与其跨多模态数据/信息的相似度之间提供了自然联系。得益于级联结构和MMH的集成，ILMMHA模型产生了一种新的高质量的多视图特征表示。为了证明ILMMHA模型的有效性和通用性，我们分别在基于跨模态的视听情感和文本图像识别任务上进行了实验。实验结果证明了该模型在多视图特征表示学习上的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

INTERPRETABLE LEARNING-BASED MULTI-MODAL HASHING ANALYSIS FOR MULTI-VIEW FEATURE REPRESENTATION LEARNING

In this work, an interpretable learning-based multi-modal hashing analysis (ILMMHA) model is proposed with appli-cation to multi-view feature representation learning. In the proposed model, a cascade network structure is first utilized to reveal the intrinsically semantic representation of input variables. Then, a multi-modal hashing (MMH) method is integrated with the explored semantic representation, gener-ating an interpretable learning-based model for multi-view feature representation. Since MMH is capable of measuring semantic similarity across multiple variables jointly, it provides a natural link between the explored intrinsically semantic representation and its similarity across multi-modal data/information. Benefiting from integration of the cascade structure and MMH, the ILMMHA model leads to a new multi-view feature representation of high quality. To demonstrate the effectiveness and generic nature of the ILMMHA model, we conduct experiments on the cross-modal based audio-visual emotion and text-image recognition tasks, respectively. Experimental results demonstrate the superiority of the proposed model on multi-view feature representation learning.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)

自引率

0.00%

发文量