用于无监督跨模态检索的基于超图的CLIP哈希

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems Pub Date : 2025-09-20 DOI:10.1016/j.knosys.2025.114508

Qian Zhang , Jia-Rui Zhao , Xiao-Qian Liu , Yu-Wei Zhan , Zhen-Duo Chen , Xin Luo , Xin-Shun Xu

{"title":"用于无监督跨模态检索的基于超图的CLIP哈希","authors":"Qian Zhang , Jia-Rui Zhao , Xiao-Qian Liu , Yu-Wei Zhan , Zhen-Duo Chen , Xin Luo , Xin-Shun Xu","doi":"10.1016/j.knosys.2025.114508","DOIUrl":null,"url":null,"abstract":"<div><div>With the surge of multi-modal data, how to effectively and efficiently find similar information has become an urgent and important need. Among the existing solutions, unsupervised cross-modal hashing can learn from unlabeled data and provide fast and satisfactory retrieval performance, making it a viable solution. However, existing unsupervised cross-modal hashing methods often inadequately model intricate cross-modal semantic relationships. To bridge this gap, this paper proposes a novel Hypergraph-based CLIP Hashing (HCH). Specifically, HCH utilizes the large-scale visual-language pre-trained model CLIP to extract visual and textual features, and employs a cross-modal Transformer to further enhance semantic fusion among these features. Then, to fully capture the semantic relevance among multi-modal data, we construct a semantic-enhanced similarity matrix and design a mean-based weighting scheme to adjust this matrix. Additionally, we compose a hypergraph convolutional network to further explore high-order semantic information within the input data, leading to more compact and high-quality hash codes. To substantiate HCH’s efficacy, we conducted experiments on three commonly used datasets, confirming its superiority over leading baselines.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114508"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hypergraph-based CLIP hashing for unsupervised cross-modal retrieval\",\"authors\":\"Qian Zhang , Jia-Rui Zhao , Xiao-Qian Liu , Yu-Wei Zhan , Zhen-Duo Chen , Xin Luo , Xin-Shun Xu\",\"doi\":\"10.1016/j.knosys.2025.114508\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the surge of multi-modal data, how to effectively and efficiently find similar information has become an urgent and important need. Among the existing solutions, unsupervised cross-modal hashing can learn from unlabeled data and provide fast and satisfactory retrieval performance, making it a viable solution. However, existing unsupervised cross-modal hashing methods often inadequately model intricate cross-modal semantic relationships. To bridge this gap, this paper proposes a novel Hypergraph-based CLIP Hashing (HCH). Specifically, HCH utilizes the large-scale visual-language pre-trained model CLIP to extract visual and textual features, and employs a cross-modal Transformer to further enhance semantic fusion among these features. Then, to fully capture the semantic relevance among multi-modal data, we construct a semantic-enhanced similarity matrix and design a mean-based weighting scheme to adjust this matrix. Additionally, we compose a hypergraph convolutional network to further explore high-order semantic information within the input data, leading to more compact and high-quality hash codes. To substantiate HCH’s efficacy, we conducted experiments on three commonly used datasets, confirming its superiority over leading baselines.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"330 \",\"pages\":\"Article 114508\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705125015473\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125015473","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

随着多模态数据的激增，如何有效、高效地查找相似信息已成为迫切而重要的需求。在现有的解决方案中，无监督跨模态哈希可以从未标记的数据中学习，并提供快速和满意的检索性能，是一种可行的解决方案。然而，现有的无监督跨模态哈希方法往往不能充分地模拟复杂的跨模态语义关系。为了弥补这一差距，本文提出了一种新的基于超图的CLIP哈希（HCH）。具体而言，HCH利用大规模视觉语言预训练模型CLIP提取视觉和文本特征，并使用跨模态Transformer进一步增强这些特征之间的语义融合。然后，为了充分捕捉多模态数据之间的语义相关性，我们构建了一个语义增强的相似度矩阵，并设计了一个基于均值的加权方案来调整该矩阵。此外，我们组成了一个超图卷积网络来进一步探索输入数据中的高阶语义信息，从而产生更紧凑和高质量的哈希码。为了证实HCH的有效性，我们在三个常用的数据集上进行了实验，证实了其优于领先基线的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Hypergraph-based CLIP hashing for unsupervised cross-modal retrieval

With the surge of multi-modal data, how to effectively and efficiently find similar information has become an urgent and important need. Among the existing solutions, unsupervised cross-modal hashing can learn from unlabeled data and provide fast and satisfactory retrieval performance, making it a viable solution. However, existing unsupervised cross-modal hashing methods often inadequately model intricate cross-modal semantic relationships. To bridge this gap, this paper proposes a novel Hypergraph-based CLIP Hashing (HCH). Specifically, HCH utilizes the large-scale visual-language pre-trained model CLIP to extract visual and textual features, and employs a cross-modal Transformer to further enhance semantic fusion among these features. Then, to fully capture the semantic relevance among multi-modal data, we construct a semantic-enhanced similarity matrix and design a mean-based weighting scheme to adjust this matrix. Additionally, we compose a hypergraph convolutional network to further explore high-order semantic information within the input data, leading to more compact and high-quality hash codes. To substantiate HCH’s efficacy, we conducted experiments on three commonly used datasets, confirming its superiority over leading baselines.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.