Cheng Huang , Wenzhe Liu , Jinghua Wang , Jinrong Cui , Jie Wen
{"title":"Dual-Driven Cross-Modal Contrastive Hashing Retrieval Network Via Structural Feature and Semantic Information","authors":"Cheng Huang , Wenzhe Liu , Jinghua Wang , Jinrong Cui , Jie Wen","doi":"10.1016/j.inffus.2025.103252","DOIUrl":null,"url":null,"abstract":"<div><div>The contrastive-based cross-modal hashing retrieval network, which is widely acknowledged for its exceptional performance in binary hash code learning, has garnered significant recognition in the field. However, there remain three issues that worth further investigation, including: (1) How to capture the structural features among intra-modal data and efficiently utilize them for subsequent hash code representation learning; (2) How to promote intra-modal learning and enhance the robustness of the resulting intra-model features, which are equally important as the inter-modal features; (3) How to effectively harness the semantic information to guide the hash code learning process. In response to above issues, this paper proposes a method called <strong>D</strong>ual-<strong>D</strong>riven Cross-Modal Contrastive Hashing Retrieval Network via <strong>S</strong>tructural Feature and <strong>S</strong>emantic Information (DDSS), which consists of three components. Firstly, DDSS extracts visual-modal and textual-modal features via Contrastive Language-Image Pre-training (CLIP) and takes them as the input for cross-modal hashing retrieval. Secondly, DDSS uses a Dual Branch Feature Learning Module to learn both structural features and self-attention features. Through intra-modal and inter-modal feature contrastive learning, our DDSS promotes the information consistency of different modalities and eliminates low-quality private features within single modality. Thirdly, our DDSS has a Dual Path Instance Hashing Module to guide hash code representation learning process through instance level and semantic level contrastive learning. The experimental results demonstrated that DDSS outperforms the benchmark methods of cross-modal hashing retrieval field. The experimental source code can be accessed through the following link: <span><span>https://github.com/hcpaper/DDSS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103252"},"PeriodicalIF":14.7000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525003252","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The contrastive-based cross-modal hashing retrieval network, which is widely acknowledged for its exceptional performance in binary hash code learning, has garnered significant recognition in the field. However, there remain three issues that worth further investigation, including: (1) How to capture the structural features among intra-modal data and efficiently utilize them for subsequent hash code representation learning; (2) How to promote intra-modal learning and enhance the robustness of the resulting intra-model features, which are equally important as the inter-modal features; (3) How to effectively harness the semantic information to guide the hash code learning process. In response to above issues, this paper proposes a method called Dual-Driven Cross-Modal Contrastive Hashing Retrieval Network via Structural Feature and Semantic Information (DDSS), which consists of three components. Firstly, DDSS extracts visual-modal and textual-modal features via Contrastive Language-Image Pre-training (CLIP) and takes them as the input for cross-modal hashing retrieval. Secondly, DDSS uses a Dual Branch Feature Learning Module to learn both structural features and self-attention features. Through intra-modal and inter-modal feature contrastive learning, our DDSS promotes the information consistency of different modalities and eliminates low-quality private features within single modality. Thirdly, our DDSS has a Dual Path Instance Hashing Module to guide hash code representation learning process through instance level and semantic level contrastive learning. The experimental results demonstrated that DDSS outperforms the benchmark methods of cross-modal hashing retrieval field. The experimental source code can be accessed through the following link: https://github.com/hcpaper/DDSS.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.