{"title":"Latent structure-oriented asymmetric hashing for cross-modal retrieval","authors":"Jiajun Ma","doi":"10.1016/j.neucom.2025.130938","DOIUrl":null,"url":null,"abstract":"<div><div>Cross-modal hashing has attracted considerable attention in cross-modal retrieval due to its excellent computational efficiency and retrieval performance. Most existing methods aim to map multimodal data into a common representation space where either semantic similarity or instance similarity is preserved. However, these methods do not consider the potential clustering structure of instances that characterizes sample separability, resulting in degraded retrieval performance. Furthermore, capturing the consistent instance similarity by effectively fusing similarities of different modalities remains an essential problem to be addressed. To tackle these issues, this paper proposes a novel latent structure-oriented asymmetric cross-modal Hashing method (LSOAH) for cross-modal retrieval. Specifically, LSOAH formulates the common representation learning with orthogonal decomposition, where each modality-specific instance is projected and decomposed into a modality-specific base matrix and a common cluster indicator matrix, and where the indicator matrix is concatenated with the hash code via an asymmetric mechanism. Additionally, we utilize Hadamard product on graphs from different modalities to explore the consistent instance similarity, and embed it in the common representation. Finally, a unified objective function is presented to enable the simultaneous exploration of the cluster structure, instance similarity and semantic similarity, as well as the hash code learning, upon which an alternating optimization algorithm is developed with theoretically proven convergence. Experimental results on three benchmark datasets confirm the superiority of the proposed LSOAH for cross-modal retrieval.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"651 ","pages":"Article 130938"},"PeriodicalIF":5.5000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225016108","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Cross-modal hashing has attracted considerable attention in cross-modal retrieval due to its excellent computational efficiency and retrieval performance. Most existing methods aim to map multimodal data into a common representation space where either semantic similarity or instance similarity is preserved. However, these methods do not consider the potential clustering structure of instances that characterizes sample separability, resulting in degraded retrieval performance. Furthermore, capturing the consistent instance similarity by effectively fusing similarities of different modalities remains an essential problem to be addressed. To tackle these issues, this paper proposes a novel latent structure-oriented asymmetric cross-modal Hashing method (LSOAH) for cross-modal retrieval. Specifically, LSOAH formulates the common representation learning with orthogonal decomposition, where each modality-specific instance is projected and decomposed into a modality-specific base matrix and a common cluster indicator matrix, and where the indicator matrix is concatenated with the hash code via an asymmetric mechanism. Additionally, we utilize Hadamard product on graphs from different modalities to explore the consistent instance similarity, and embed it in the common representation. Finally, a unified objective function is presented to enable the simultaneous exploration of the cluster structure, instance similarity and semantic similarity, as well as the hash code learning, upon which an alternating optimization algorithm is developed with theoretically proven convergence. Experimental results on three benchmark datasets confirm the superiority of the proposed LSOAH for cross-modal retrieval.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.