基于CNN和视觉变压器网络的深度哈希图像检索

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Soft Computing Pub Date : 2025-05-17 DOI:10.1016/j.asoc.2025.113244

Shuli Cheng , Xingming Xiao , Liejun Wang

{"title":"基于CNN和视觉变压器网络的深度哈希图像检索","authors":"Shuli Cheng , Xingming Xiao , Liejun Wang","doi":"10.1016/j.asoc.2025.113244","DOIUrl":null,"url":null,"abstract":"<div><div>Deep hashing technology can achieve unified representation of multimedia technology, which is widely used in fields such as smart agriculture, smart transportation, and public safety. However, the current deep hashing methods cannot achieve efficient fusion of global and local information, and the learning ability of hash code representation needs to be further strengthened. In this paper, we propose a deep hashing retrieval algorithm based on integrated CNN and visual Transformer(ICVT) network. Firstly, we propose a lightweight nonlinear spatial group enhancement (NSGE) module, which is integrated with the Transformer through a parallel architecture and introduces a self-attention mechanism to enhance the spatial distribution of semantic features through the similarity between local and global features. The CNN representation capability of feature maps is improved to enhance the spatial distribution of semantic features within each feature semantic group. Secondly, we propose a margin contrast loss function to optimize the hash code parameters, which is used to improve the retrieval accuracy of the algorithm for multi-label datasets. Finally, extensive experiments on CIFAR-10, NUS-WIDE, and ImageNet datasets demonstrate that ICVT has superior retrieval performance compared with the currently popular algorithms, especially on CIFAR-10, where the mAP performance reaches 97.5%.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"177 ","pages":"Article 113244"},"PeriodicalIF":7.2000,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep hashing image retrieval based on CNN and visual transformer network\",\"authors\":\"Shuli Cheng , Xingming Xiao , Liejun Wang\",\"doi\":\"10.1016/j.asoc.2025.113244\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Deep hashing technology can achieve unified representation of multimedia technology, which is widely used in fields such as smart agriculture, smart transportation, and public safety. However, the current deep hashing methods cannot achieve efficient fusion of global and local information, and the learning ability of hash code representation needs to be further strengthened. In this paper, we propose a deep hashing retrieval algorithm based on integrated CNN and visual Transformer(ICVT) network. Firstly, we propose a lightweight nonlinear spatial group enhancement (NSGE) module, which is integrated with the Transformer through a parallel architecture and introduces a self-attention mechanism to enhance the spatial distribution of semantic features through the similarity between local and global features. The CNN representation capability of feature maps is improved to enhance the spatial distribution of semantic features within each feature semantic group. Secondly, we propose a margin contrast loss function to optimize the hash code parameters, which is used to improve the retrieval accuracy of the algorithm for multi-label datasets. Finally, extensive experiments on CIFAR-10, NUS-WIDE, and ImageNet datasets demonstrate that ICVT has superior retrieval performance compared with the currently popular algorithms, especially on CIFAR-10, where the mAP performance reaches 97.5%.</div></div>\",\"PeriodicalId\":50737,\"journal\":{\"name\":\"Applied Soft Computing\",\"volume\":\"177 \",\"pages\":\"Article 113244\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2025-05-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Soft Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1568494625005551\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494625005551","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

深度哈希技术可以实现多媒体技术的统一表示，广泛应用于智慧农业、智慧交通、公共安全等领域。然而，目前的深度哈希方法无法实现全局和局部信息的高效融合，哈希码表示的学习能力有待进一步加强。本文提出了一种基于CNN和visual Transformer（ICVT）网络的深度哈希检索算法。首先，我们提出了一种轻量级的非线性空间群增强（NSGE）模块，该模块通过并行架构与Transformer集成，并引入自关注机制，通过局部和全局特征的相似性来增强语义特征的空间分布。改进了特征图的CNN表示能力，增强了每个特征语义组内语义特征的空间分布。其次，我们提出了一个空白对比损失函数来优化哈希码参数，用于提高算法对多标签数据集的检索精度。最后，在CIFAR-10、NUS-WIDE和ImageNet数据集上的大量实验表明，与目前流行的算法相比，ICVT具有更好的检索性能，特别是在CIFAR-10上，mAP性能达到97.5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep hashing image retrieval based on CNN and visual transformer network

Deep hashing technology can achieve unified representation of multimedia technology, which is widely used in fields such as smart agriculture, smart transportation, and public safety. However, the current deep hashing methods cannot achieve efficient fusion of global and local information, and the learning ability of hash code representation needs to be further strengthened. In this paper, we propose a deep hashing retrieval algorithm based on integrated CNN and visual Transformer(ICVT) network. Firstly, we propose a lightweight nonlinear spatial group enhancement (NSGE) module, which is integrated with the Transformer through a parallel architecture and introduces a self-attention mechanism to enhance the spatial distribution of semantic features through the similarity between local and global features. The CNN representation capability of feature maps is improved to enhance the spatial distribution of semantic features within each feature semantic group. Secondly, we propose a margin contrast loss function to optimize the hash code parameters, which is used to improve the retrieval accuracy of the algorithm for multi-label datasets. Finally, extensive experiments on CIFAR-10, NUS-WIDE, and ImageNet datasets demonstrate that ICVT has superior retrieval performance compared with the currently popular algorithms, especially on CIFAR-10, where the mAP performance reaches 97.5%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Soft Computing 工程技术-计算机：跨学科应用

CiteScore

15.80

自引率

6.90%

发文量

874

审稿时长

10.9 months

期刊介绍： Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities. Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.