Semantic-aware matrix factorization hashing with intra- and inter-modality fusion for image-text retrieval

IF 3.4 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Intelligence Pub Date : 2024-11-19 DOI:10.1007/s10489-024-06060-2

Dongxue Shi, Zheng Liu, Shanshan Gao, Ang Li

{"title":"Semantic-aware matrix factorization hashing with intra- and inter-modality fusion for image-text retrieval","authors":"Dongxue Shi, Zheng Liu, Shanshan Gao, Ang Li","doi":"10.1007/s10489-024-06060-2","DOIUrl":null,"url":null,"abstract":"<div><p>Cross-modal retrieval aims to retrieve related items in one modality using a query from another modality. As the foundational and key challenge of it, image-text retrieval has garnered significant research interest from scholars. In recent years, hashing techniques have gained widespread interest for large-scale dataset retrieval due to their minimal storage requirements and rapid query processing capabilities. However, existing hashing approaches either learn unified representations for both modalities or specific representations within each modality. The former approach lacks modality-specific information, while the latter does not consider the relationships between image-text pairs across various modalities. Therefore, we propose an innovative supervised hashing method that leverages intra-modality and inter-modality matrix factorization. This method integrates semantic labels into the hash code learning process, aiming to understand both inter-modality and intra-modality relationships within a unified framework for diverse data types. The objective is to preserve inter-modal complementarity and intra-modal consistency in multimodal data. Our approach involves: (1) mapping data from various modalities into a shared latent semantic space through inter-modality matrix factorization to derive unified hash codes, and (2) mapping data from each modality into modality-specific latent semantic spaces via intra-modality matrix factorization to obtain modality-specific hash codes. These are subsequently merged to construct the final hash codes. Experimental results demonstrate that our approach surpasses several state-of-the-art cross-modal image-text retrieval hashing methods. Additionally, ablation studies further validate the effectiveness of each component within our model.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 1","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-024-06060-2","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Cross-modal retrieval aims to retrieve related items in one modality using a query from another modality. As the foundational and key challenge of it, image-text retrieval has garnered significant research interest from scholars. In recent years, hashing techniques have gained widespread interest for large-scale dataset retrieval due to their minimal storage requirements and rapid query processing capabilities. However, existing hashing approaches either learn unified representations for both modalities or specific representations within each modality. The former approach lacks modality-specific information, while the latter does not consider the relationships between image-text pairs across various modalities. Therefore, we propose an innovative supervised hashing method that leverages intra-modality and inter-modality matrix factorization. This method integrates semantic labels into the hash code learning process, aiming to understand both inter-modality and intra-modality relationships within a unified framework for diverse data types. The objective is to preserve inter-modal complementarity and intra-modal consistency in multimodal data. Our approach involves: (1) mapping data from various modalities into a shared latent semantic space through inter-modality matrix factorization to derive unified hash codes, and (2) mapping data from each modality into modality-specific latent semantic spaces via intra-modality matrix factorization to obtain modality-specific hash codes. These are subsequently merged to construct the final hash codes. Experimental results demonstrate that our approach surpasses several state-of-the-art cross-modal image-text retrieval hashing methods. Additionally, ablation studies further validate the effectiveness of each component within our model.

Abstract Image

查看原文本刊更多论文

语义感知矩阵因式分解散列与模式内和模式间融合用于图像文本检索

跨模态检索（Cross-modal retrieval）的目的是利用一种模态的查询来检索另一种模态的相关项目。作为其基础和关键挑战，图像-文本检索引起了学者们的极大研究兴趣。近年来，散列技术因其最低的存储要求和快速的查询处理能力，在大规模数据集检索中受到广泛关注。然而，现有的散列方法要么是学习两种模态的统一表征，要么是学习每种模态的特定表征。前一种方法缺乏特定模态的信息，而后一种方法则没有考虑不同模态的图像-文本对之间的关系。因此，我们提出了一种创新的监督散列方法，利用模态内和模态间矩阵因式分解。该方法将语义标签整合到哈希代码学习过程中，旨在通过统一的框架了解不同数据类型的模式间和模式内关系。目的是在多模态数据中保持模态间互补性和模态内一致性。我们的方法包括：(1) 通过模态间矩阵因式分解将各种模态的数据映射到共享的潜在语义空间，从而得到统一的哈希代码；(2) 通过模态内矩阵因式分解将每种模态的数据映射到特定模态的潜在语义空间，从而得到特定模态的哈希代码。然后将这些数据合并，构建出最终的哈希代码。实验结果表明，我们的方法超越了几种最先进的跨模态图像-文本检索散列方法。此外，消融研究进一步验证了我们模型中每个组件的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied Intelligence 工程技术-计算机：人工智能

CiteScore

6.60

自引率

20.80%

发文量

1361

审稿时长

5.9 months

期刊介绍： With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance. The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.