用于无监督跨模态检索的基于超图的CLIP哈希

IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Qian Zhang , Jia-Rui Zhao , Xiao-Qian Liu , Yu-Wei Zhan , Zhen-Duo Chen , Xin Luo , Xin-Shun Xu
{"title":"用于无监督跨模态检索的基于超图的CLIP哈希","authors":"Qian Zhang ,&nbsp;Jia-Rui Zhao ,&nbsp;Xiao-Qian Liu ,&nbsp;Yu-Wei Zhan ,&nbsp;Zhen-Duo Chen ,&nbsp;Xin Luo ,&nbsp;Xin-Shun Xu","doi":"10.1016/j.knosys.2025.114508","DOIUrl":null,"url":null,"abstract":"<div><div>With the surge of multi-modal data, how to effectively and efficiently find similar information has become an urgent and important need. Among the existing solutions, unsupervised cross-modal hashing can learn from unlabeled data and provide fast and satisfactory retrieval performance, making it a viable solution. However, existing unsupervised cross-modal hashing methods often inadequately model intricate cross-modal semantic relationships. To bridge this gap, this paper proposes a novel Hypergraph-based CLIP Hashing (HCH). Specifically, HCH utilizes the large-scale visual-language pre-trained model CLIP to extract visual and textual features, and employs a cross-modal Transformer to further enhance semantic fusion among these features. Then, to fully capture the semantic relevance among multi-modal data, we construct a semantic-enhanced similarity matrix and design a mean-based weighting scheme to adjust this matrix. Additionally, we compose a hypergraph convolutional network to further explore high-order semantic information within the input data, leading to more compact and high-quality hash codes. To substantiate HCH’s efficacy, we conducted experiments on three commonly used datasets, confirming its superiority over leading baselines.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114508"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hypergraph-based CLIP hashing for unsupervised cross-modal retrieval\",\"authors\":\"Qian Zhang ,&nbsp;Jia-Rui Zhao ,&nbsp;Xiao-Qian Liu ,&nbsp;Yu-Wei Zhan ,&nbsp;Zhen-Duo Chen ,&nbsp;Xin Luo ,&nbsp;Xin-Shun Xu\",\"doi\":\"10.1016/j.knosys.2025.114508\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the surge of multi-modal data, how to effectively and efficiently find similar information has become an urgent and important need. Among the existing solutions, unsupervised cross-modal hashing can learn from unlabeled data and provide fast and satisfactory retrieval performance, making it a viable solution. However, existing unsupervised cross-modal hashing methods often inadequately model intricate cross-modal semantic relationships. To bridge this gap, this paper proposes a novel Hypergraph-based CLIP Hashing (HCH). Specifically, HCH utilizes the large-scale visual-language pre-trained model CLIP to extract visual and textual features, and employs a cross-modal Transformer to further enhance semantic fusion among these features. Then, to fully capture the semantic relevance among multi-modal data, we construct a semantic-enhanced similarity matrix and design a mean-based weighting scheme to adjust this matrix. Additionally, we compose a hypergraph convolutional network to further explore high-order semantic information within the input data, leading to more compact and high-quality hash codes. To substantiate HCH’s efficacy, we conducted experiments on three commonly used datasets, confirming its superiority over leading baselines.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"330 \",\"pages\":\"Article 114508\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705125015473\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125015473","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

随着多模态数据的激增,如何有效、高效地查找相似信息已成为迫切而重要的需求。在现有的解决方案中,无监督跨模态哈希可以从未标记的数据中学习,并提供快速和满意的检索性能,是一种可行的解决方案。然而,现有的无监督跨模态哈希方法往往不能充分地模拟复杂的跨模态语义关系。为了弥补这一差距,本文提出了一种新的基于超图的CLIP哈希(HCH)。具体而言,HCH利用大规模视觉语言预训练模型CLIP提取视觉和文本特征,并使用跨模态Transformer进一步增强这些特征之间的语义融合。然后,为了充分捕捉多模态数据之间的语义相关性,我们构建了一个语义增强的相似度矩阵,并设计了一个基于均值的加权方案来调整该矩阵。此外,我们组成了一个超图卷积网络来进一步探索输入数据中的高阶语义信息,从而产生更紧凑和高质量的哈希码。为了证实HCH的有效性,我们在三个常用的数据集上进行了实验,证实了其优于领先基线的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Hypergraph-based CLIP hashing for unsupervised cross-modal retrieval
With the surge of multi-modal data, how to effectively and efficiently find similar information has become an urgent and important need. Among the existing solutions, unsupervised cross-modal hashing can learn from unlabeled data and provide fast and satisfactory retrieval performance, making it a viable solution. However, existing unsupervised cross-modal hashing methods often inadequately model intricate cross-modal semantic relationships. To bridge this gap, this paper proposes a novel Hypergraph-based CLIP Hashing (HCH). Specifically, HCH utilizes the large-scale visual-language pre-trained model CLIP to extract visual and textual features, and employs a cross-modal Transformer to further enhance semantic fusion among these features. Then, to fully capture the semantic relevance among multi-modal data, we construct a semantic-enhanced similarity matrix and design a mean-based weighting scheme to adjust this matrix. Additionally, we compose a hypergraph convolutional network to further explore high-order semantic information within the input data, leading to more compact and high-quality hash codes. To substantiate HCH’s efficacy, we conducted experiments on three commonly used datasets, confirming its superiority over leading baselines.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Knowledge-Based Systems
Knowledge-Based Systems 工程技术-计算机:人工智能
CiteScore
14.80
自引率
12.50%
发文量
1245
审稿时长
7.8 months
期刊介绍: Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信