基于自校准协同注意的深度跨视觉语义哈希

IF 6.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Soft Computing Pub Date : 2025-09-18 DOI:10.1016/j.asoc.2025.113937

Hao Feng, Xiangbo Zhou, Yue Wu, Jian Zhou, Banglei Zhao

{"title":"基于自校准协同注意的深度跨视觉语义哈希","authors":"Hao Feng, Xiangbo Zhou, Yue Wu, Jian Zhou, Banglei Zhao","doi":"10.1016/j.asoc.2025.113937","DOIUrl":null,"url":null,"abstract":"<div><div>Deep hashing has garnered considerable attention due to its remarkable retrieval efficiency and low storage cost, particularly in visual retrieval scenarios. However, current deep hashing methods generally integrate hash coding into a single-stream architecture, which limits the discriminative power of learned visual features and yields suboptimal hash codes. Additionally, over-reliance on semantic labels shared across samples fails to fully exploit the intrinsic semantic correlations between labels and corresponding visual features. To address these issues, we propose a deep cross-visual semantic hashing (DCvSH) method for image retrieval. First, we develop a visual image feature decoupling encoding network that leverages a self-calibrated collaborative attention mechanism to disentangle common and specific semantics across related images. These decoupled features are fed into a shared decoder for image reconstruction, yielding discriminative visual feature representations. Second, we construct a cross-visual semantic representation learning network with a two-level multi-layer perceptron to capture the underlying relationships between semantic label encodings and visual feature embeddings, while a hypergraph structure is introduced to preserve pairwise similarity relationships. Experimental results on the CIFAR-10, NUS-WIDE, and MIRFLICKR datasets demonstrate consistent improvements, with average mean average precision (mAP) scores reaching 0.895, 0.874, and 0.881 at different code lengths, respectively. Notably, DCvSH outperforms other baselines across all evaluation metrics.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"185 ","pages":"Article 113937"},"PeriodicalIF":6.6000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep cross-visual semantic hashing with self-calibrated collaborative attention\",\"authors\":\"Hao Feng, Xiangbo Zhou, Yue Wu, Jian Zhou, Banglei Zhao\",\"doi\":\"10.1016/j.asoc.2025.113937\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Deep hashing has garnered considerable attention due to its remarkable retrieval efficiency and low storage cost, particularly in visual retrieval scenarios. However, current deep hashing methods generally integrate hash coding into a single-stream architecture, which limits the discriminative power of learned visual features and yields suboptimal hash codes. Additionally, over-reliance on semantic labels shared across samples fails to fully exploit the intrinsic semantic correlations between labels and corresponding visual features. To address these issues, we propose a deep cross-visual semantic hashing (DCvSH) method for image retrieval. First, we develop a visual image feature decoupling encoding network that leverages a self-calibrated collaborative attention mechanism to disentangle common and specific semantics across related images. These decoupled features are fed into a shared decoder for image reconstruction, yielding discriminative visual feature representations. Second, we construct a cross-visual semantic representation learning network with a two-level multi-layer perceptron to capture the underlying relationships between semantic label encodings and visual feature embeddings, while a hypergraph structure is introduced to preserve pairwise similarity relationships. Experimental results on the CIFAR-10, NUS-WIDE, and MIRFLICKR datasets demonstrate consistent improvements, with average mean average precision (mAP) scores reaching 0.895, 0.874, and 0.881 at different code lengths, respectively. Notably, DCvSH outperforms other baselines across all evaluation metrics.</div></div>\",\"PeriodicalId\":50737,\"journal\":{\"name\":\"Applied Soft Computing\",\"volume\":\"185 \",\"pages\":\"Article 113937\"},\"PeriodicalIF\":6.6000,\"publicationDate\":\"2025-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Soft Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1568494625012505\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494625012505","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

深度哈希以其显著的检索效率和较低的存储成本，特别是在视觉检索场景中，受到了广泛的关注。然而，目前的深度哈希方法通常将哈希编码集成到单流架构中，这限制了学习的视觉特征的判别能力，并产生次优哈希码。此外，过度依赖跨样本共享的语义标签无法充分利用标签与相应视觉特征之间的内在语义相关性。为了解决这些问题，我们提出了一种深度跨视觉语义哈希（DCvSH）图像检索方法。首先，我们开发了一个视觉图像特征解耦编码网络，该网络利用自校准的协作注意机制来解开相关图像之间的共同和特定语义。这些解耦的特征被输入到一个共享的解码器中进行图像重建，产生判别性的视觉特征表示。其次，利用两级多层感知器构建跨视觉语义表示学习网络，捕获语义标签编码与视觉特征嵌入之间的潜在关系，同时引入超图结构来保持两两相似关系。在CIFAR-10、NUS-WIDE和MIRFLICKR数据集上的实验结果显示了一致的改进，不同码长下的平均平均精度（mAP）得分分别达到0.895、0.874和0.881。值得注意的是，DCvSH在所有评估指标上都优于其他基线。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep cross-visual semantic hashing with self-calibrated collaborative attention

Deep hashing has garnered considerable attention due to its remarkable retrieval efficiency and low storage cost, particularly in visual retrieval scenarios. However, current deep hashing methods generally integrate hash coding into a single-stream architecture, which limits the discriminative power of learned visual features and yields suboptimal hash codes. Additionally, over-reliance on semantic labels shared across samples fails to fully exploit the intrinsic semantic correlations between labels and corresponding visual features. To address these issues, we propose a deep cross-visual semantic hashing (DCvSH) method for image retrieval. First, we develop a visual image feature decoupling encoding network that leverages a self-calibrated collaborative attention mechanism to disentangle common and specific semantics across related images. These decoupled features are fed into a shared decoder for image reconstruction, yielding discriminative visual feature representations. Second, we construct a cross-visual semantic representation learning network with a two-level multi-layer perceptron to capture the underlying relationships between semantic label encodings and visual feature embeddings, while a hypergraph structure is introduced to preserve pairwise similarity relationships. Experimental results on the CIFAR-10, NUS-WIDE, and MIRFLICKR datasets demonstrate consistent improvements, with average mean average precision (mAP) scores reaching 0.895, 0.874, and 0.881 at different code lengths, respectively. Notably, DCvSH outperforms other baselines across all evaluation metrics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Soft Computing 工程技术-计算机：跨学科应用

CiteScore

15.80

自引率

6.90%

发文量

874

审稿时长

10.9 months

期刊介绍： Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities. Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.