基于结构特征和语义信息的双驱动跨模态对比哈希检索网络

IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Cheng Huang , Wenzhe Liu , Jinghua Wang , Jinrong Cui , Jie Wen
{"title":"基于结构特征和语义信息的双驱动跨模态对比哈希检索网络","authors":"Cheng Huang ,&nbsp;Wenzhe Liu ,&nbsp;Jinghua Wang ,&nbsp;Jinrong Cui ,&nbsp;Jie Wen","doi":"10.1016/j.inffus.2025.103252","DOIUrl":null,"url":null,"abstract":"<div><div>The contrastive-based cross-modal hashing retrieval network, which is widely acknowledged for its exceptional performance in binary hash code learning, has garnered significant recognition in the field. However, there remain three issues that worth further investigation, including: (1) How to capture the structural features among intra-modal data and efficiently utilize them for subsequent hash code representation learning; (2) How to promote intra-modal learning and enhance the robustness of the resulting intra-model features, which are equally important as the inter-modal features; (3) How to effectively harness the semantic information to guide the hash code learning process. In response to above issues, this paper proposes a method called <strong>D</strong>ual-<strong>D</strong>riven Cross-Modal Contrastive Hashing Retrieval Network via <strong>S</strong>tructural Feature and <strong>S</strong>emantic Information (DDSS), which consists of three components. Firstly, DDSS extracts visual-modal and textual-modal features via Contrastive Language-Image Pre-training (CLIP) and takes them as the input for cross-modal hashing retrieval. Secondly, DDSS uses a Dual Branch Feature Learning Module to learn both structural features and self-attention features. Through intra-modal and inter-modal feature contrastive learning, our DDSS promotes the information consistency of different modalities and eliminates low-quality private features within single modality. Thirdly, our DDSS has a Dual Path Instance Hashing Module to guide hash code representation learning process through instance level and semantic level contrastive learning. The experimental results demonstrated that DDSS outperforms the benchmark methods of cross-modal hashing retrieval field. The experimental source code can be accessed through the following link: <span><span>https://github.com/hcpaper/DDSS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103252"},"PeriodicalIF":14.7000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dual-Driven Cross-Modal Contrastive Hashing Retrieval Network Via Structural Feature and Semantic Information\",\"authors\":\"Cheng Huang ,&nbsp;Wenzhe Liu ,&nbsp;Jinghua Wang ,&nbsp;Jinrong Cui ,&nbsp;Jie Wen\",\"doi\":\"10.1016/j.inffus.2025.103252\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The contrastive-based cross-modal hashing retrieval network, which is widely acknowledged for its exceptional performance in binary hash code learning, has garnered significant recognition in the field. However, there remain three issues that worth further investigation, including: (1) How to capture the structural features among intra-modal data and efficiently utilize them for subsequent hash code representation learning; (2) How to promote intra-modal learning and enhance the robustness of the resulting intra-model features, which are equally important as the inter-modal features; (3) How to effectively harness the semantic information to guide the hash code learning process. In response to above issues, this paper proposes a method called <strong>D</strong>ual-<strong>D</strong>riven Cross-Modal Contrastive Hashing Retrieval Network via <strong>S</strong>tructural Feature and <strong>S</strong>emantic Information (DDSS), which consists of three components. Firstly, DDSS extracts visual-modal and textual-modal features via Contrastive Language-Image Pre-training (CLIP) and takes them as the input for cross-modal hashing retrieval. Secondly, DDSS uses a Dual Branch Feature Learning Module to learn both structural features and self-attention features. Through intra-modal and inter-modal feature contrastive learning, our DDSS promotes the information consistency of different modalities and eliminates low-quality private features within single modality. Thirdly, our DDSS has a Dual Path Instance Hashing Module to guide hash code representation learning process through instance level and semantic level contrastive learning. The experimental results demonstrated that DDSS outperforms the benchmark methods of cross-modal hashing retrieval field. The experimental source code can be accessed through the following link: <span><span>https://github.com/hcpaper/DDSS</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"123 \",\"pages\":\"Article 103252\"},\"PeriodicalIF\":14.7000,\"publicationDate\":\"2025-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253525003252\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525003252","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

基于对比的跨模态哈希检索网络以其在二进制哈希码学习方面的优异性能得到了广泛的认可,在该领域得到了广泛的认可。然而,仍有三个问题值得进一步研究,包括:(1)如何捕获模态内数据的结构特征,并有效地利用它们进行后续的哈希码表示学习;(2)如何促进模内学习,增强模型内特征的鲁棒性,这些特征与模间特征同样重要;(3)如何有效地利用语义信息来指导哈希码学习过程。针对上述问题,本文提出了一种基于结构特征和语义信息的双驱动跨模态对比哈希检索网络(DDSS)方法,该方法由三个部分组成。首先,DDSS通过对比语言图像预训练(CLIP)提取视觉模态和文本模态特征,并将其作为跨模态哈希检索的输入;其次,DDSS使用双分支特征学习模块学习结构特征和自关注特征。通过模态内和模态间的特征对比学习,我们的DDSS促进了不同模态的信息一致性,消除了单一模态内的低质量私有特征。第三,我们的DDSS有一个双路径实例哈希模块,通过实例级和语义级对比学习来指导哈希码表示学习过程。实验结果表明,DDSS在跨模态哈希检索领域的性能优于基准方法。实验源代码可以通过以下链接访问:https://github.com/hcpaper/DDSS。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Dual-Driven Cross-Modal Contrastive Hashing Retrieval Network Via Structural Feature and Semantic Information
The contrastive-based cross-modal hashing retrieval network, which is widely acknowledged for its exceptional performance in binary hash code learning, has garnered significant recognition in the field. However, there remain three issues that worth further investigation, including: (1) How to capture the structural features among intra-modal data and efficiently utilize them for subsequent hash code representation learning; (2) How to promote intra-modal learning and enhance the robustness of the resulting intra-model features, which are equally important as the inter-modal features; (3) How to effectively harness the semantic information to guide the hash code learning process. In response to above issues, this paper proposes a method called Dual-Driven Cross-Modal Contrastive Hashing Retrieval Network via Structural Feature and Semantic Information (DDSS), which consists of three components. Firstly, DDSS extracts visual-modal and textual-modal features via Contrastive Language-Image Pre-training (CLIP) and takes them as the input for cross-modal hashing retrieval. Secondly, DDSS uses a Dual Branch Feature Learning Module to learn both structural features and self-attention features. Through intra-modal and inter-modal feature contrastive learning, our DDSS promotes the information consistency of different modalities and eliminates low-quality private features within single modality. Thirdly, our DDSS has a Dual Path Instance Hashing Module to guide hash code representation learning process through instance level and semantic level contrastive learning. The experimental results demonstrated that DDSS outperforms the benchmark methods of cross-modal hashing retrieval field. The experimental source code can be accessed through the following link: https://github.com/hcpaper/DDSS.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information Fusion
Information Fusion 工程技术-计算机:理论方法
CiteScore
33.20
自引率
4.30%
发文量
161
审稿时长
7.9 months
期刊介绍: Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信