{"title":"基于自校准协同注意的深度跨视觉语义哈希","authors":"Hao Feng, Xiangbo Zhou, Yue Wu, Jian Zhou, Banglei Zhao","doi":"10.1016/j.asoc.2025.113937","DOIUrl":null,"url":null,"abstract":"<div><div>Deep hashing has garnered considerable attention due to its remarkable retrieval efficiency and low storage cost, particularly in visual retrieval scenarios. However, current deep hashing methods generally integrate hash coding into a single-stream architecture, which limits the discriminative power of learned visual features and yields suboptimal hash codes. Additionally, over-reliance on semantic labels shared across samples fails to fully exploit the intrinsic semantic correlations between labels and corresponding visual features. To address these issues, we propose a deep cross-visual semantic hashing (DCvSH) method for image retrieval. First, we develop a visual image feature decoupling encoding network that leverages a self-calibrated collaborative attention mechanism to disentangle common and specific semantics across related images. These decoupled features are fed into a shared decoder for image reconstruction, yielding discriminative visual feature representations. Second, we construct a cross-visual semantic representation learning network with a two-level multi-layer perceptron to capture the underlying relationships between semantic label encodings and visual feature embeddings, while a hypergraph structure is introduced to preserve pairwise similarity relationships. Experimental results on the CIFAR-10, NUS-WIDE, and MIRFLICKR datasets demonstrate consistent improvements, with average mean average precision (mAP) scores reaching 0.895, 0.874, and 0.881 at different code lengths, respectively. Notably, DCvSH outperforms other baselines across all evaluation metrics.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"185 ","pages":"Article 113937"},"PeriodicalIF":6.6000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep cross-visual semantic hashing with self-calibrated collaborative attention\",\"authors\":\"Hao Feng, Xiangbo Zhou, Yue Wu, Jian Zhou, Banglei Zhao\",\"doi\":\"10.1016/j.asoc.2025.113937\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Deep hashing has garnered considerable attention due to its remarkable retrieval efficiency and low storage cost, particularly in visual retrieval scenarios. However, current deep hashing methods generally integrate hash coding into a single-stream architecture, which limits the discriminative power of learned visual features and yields suboptimal hash codes. Additionally, over-reliance on semantic labels shared across samples fails to fully exploit the intrinsic semantic correlations between labels and corresponding visual features. To address these issues, we propose a deep cross-visual semantic hashing (DCvSH) method for image retrieval. First, we develop a visual image feature decoupling encoding network that leverages a self-calibrated collaborative attention mechanism to disentangle common and specific semantics across related images. These decoupled features are fed into a shared decoder for image reconstruction, yielding discriminative visual feature representations. Second, we construct a cross-visual semantic representation learning network with a two-level multi-layer perceptron to capture the underlying relationships between semantic label encodings and visual feature embeddings, while a hypergraph structure is introduced to preserve pairwise similarity relationships. Experimental results on the CIFAR-10, NUS-WIDE, and MIRFLICKR datasets demonstrate consistent improvements, with average mean average precision (mAP) scores reaching 0.895, 0.874, and 0.881 at different code lengths, respectively. Notably, DCvSH outperforms other baselines across all evaluation metrics.</div></div>\",\"PeriodicalId\":50737,\"journal\":{\"name\":\"Applied Soft Computing\",\"volume\":\"185 \",\"pages\":\"Article 113937\"},\"PeriodicalIF\":6.6000,\"publicationDate\":\"2025-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Soft Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1568494625012505\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494625012505","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Deep cross-visual semantic hashing with self-calibrated collaborative attention
Deep hashing has garnered considerable attention due to its remarkable retrieval efficiency and low storage cost, particularly in visual retrieval scenarios. However, current deep hashing methods generally integrate hash coding into a single-stream architecture, which limits the discriminative power of learned visual features and yields suboptimal hash codes. Additionally, over-reliance on semantic labels shared across samples fails to fully exploit the intrinsic semantic correlations between labels and corresponding visual features. To address these issues, we propose a deep cross-visual semantic hashing (DCvSH) method for image retrieval. First, we develop a visual image feature decoupling encoding network that leverages a self-calibrated collaborative attention mechanism to disentangle common and specific semantics across related images. These decoupled features are fed into a shared decoder for image reconstruction, yielding discriminative visual feature representations. Second, we construct a cross-visual semantic representation learning network with a two-level multi-layer perceptron to capture the underlying relationships between semantic label encodings and visual feature embeddings, while a hypergraph structure is introduced to preserve pairwise similarity relationships. Experimental results on the CIFAR-10, NUS-WIDE, and MIRFLICKR datasets demonstrate consistent improvements, with average mean average precision (mAP) scores reaching 0.895, 0.874, and 0.881 at different code lengths, respectively. Notably, DCvSH outperforms other baselines across all evaluation metrics.
期刊介绍:
Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities.
Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.