基于CNN和视觉变压器网络的深度哈希图像检索

IF 7.2 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Shuli Cheng , Xingming Xiao , Liejun Wang
{"title":"基于CNN和视觉变压器网络的深度哈希图像检索","authors":"Shuli Cheng ,&nbsp;Xingming Xiao ,&nbsp;Liejun Wang","doi":"10.1016/j.asoc.2025.113244","DOIUrl":null,"url":null,"abstract":"<div><div>Deep hashing technology can achieve unified representation of multimedia technology, which is widely used in fields such as smart agriculture, smart transportation, and public safety. However, the current deep hashing methods cannot achieve efficient fusion of global and local information, and the learning ability of hash code representation needs to be further strengthened. In this paper, we propose a deep hashing retrieval algorithm based on integrated CNN and visual Transformer(ICVT) network. Firstly, we propose a lightweight nonlinear spatial group enhancement (NSGE) module, which is integrated with the Transformer through a parallel architecture and introduces a self-attention mechanism to enhance the spatial distribution of semantic features through the similarity between local and global features. The CNN representation capability of feature maps is improved to enhance the spatial distribution of semantic features within each feature semantic group. Secondly, we propose a margin contrast loss function to optimize the hash code parameters, which is used to improve the retrieval accuracy of the algorithm for multi-label datasets. Finally, extensive experiments on CIFAR-10, NUS-WIDE, and ImageNet datasets demonstrate that ICVT has superior retrieval performance compared with the currently popular algorithms, especially on CIFAR-10, where the mAP performance reaches 97.5%.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"177 ","pages":"Article 113244"},"PeriodicalIF":7.2000,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep hashing image retrieval based on CNN and visual transformer network\",\"authors\":\"Shuli Cheng ,&nbsp;Xingming Xiao ,&nbsp;Liejun Wang\",\"doi\":\"10.1016/j.asoc.2025.113244\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Deep hashing technology can achieve unified representation of multimedia technology, which is widely used in fields such as smart agriculture, smart transportation, and public safety. However, the current deep hashing methods cannot achieve efficient fusion of global and local information, and the learning ability of hash code representation needs to be further strengthened. In this paper, we propose a deep hashing retrieval algorithm based on integrated CNN and visual Transformer(ICVT) network. Firstly, we propose a lightweight nonlinear spatial group enhancement (NSGE) module, which is integrated with the Transformer through a parallel architecture and introduces a self-attention mechanism to enhance the spatial distribution of semantic features through the similarity between local and global features. The CNN representation capability of feature maps is improved to enhance the spatial distribution of semantic features within each feature semantic group. Secondly, we propose a margin contrast loss function to optimize the hash code parameters, which is used to improve the retrieval accuracy of the algorithm for multi-label datasets. Finally, extensive experiments on CIFAR-10, NUS-WIDE, and ImageNet datasets demonstrate that ICVT has superior retrieval performance compared with the currently popular algorithms, especially on CIFAR-10, where the mAP performance reaches 97.5%.</div></div>\",\"PeriodicalId\":50737,\"journal\":{\"name\":\"Applied Soft Computing\",\"volume\":\"177 \",\"pages\":\"Article 113244\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2025-05-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Soft Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1568494625005551\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494625005551","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

深度哈希技术可以实现多媒体技术的统一表示,广泛应用于智慧农业、智慧交通、公共安全等领域。然而,目前的深度哈希方法无法实现全局和局部信息的高效融合,哈希码表示的学习能力有待进一步加强。本文提出了一种基于CNN和visual Transformer(ICVT)网络的深度哈希检索算法。首先,我们提出了一种轻量级的非线性空间群增强(NSGE)模块,该模块通过并行架构与Transformer集成,并引入自关注机制,通过局部和全局特征的相似性来增强语义特征的空间分布。改进了特征图的CNN表示能力,增强了每个特征语义组内语义特征的空间分布。其次,我们提出了一个空白对比损失函数来优化哈希码参数,用于提高算法对多标签数据集的检索精度。最后,在CIFAR-10、NUS-WIDE和ImageNet数据集上的大量实验表明,与目前流行的算法相比,ICVT具有更好的检索性能,特别是在CIFAR-10上,mAP性能达到97.5%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Deep hashing image retrieval based on CNN and visual transformer network
Deep hashing technology can achieve unified representation of multimedia technology, which is widely used in fields such as smart agriculture, smart transportation, and public safety. However, the current deep hashing methods cannot achieve efficient fusion of global and local information, and the learning ability of hash code representation needs to be further strengthened. In this paper, we propose a deep hashing retrieval algorithm based on integrated CNN and visual Transformer(ICVT) network. Firstly, we propose a lightweight nonlinear spatial group enhancement (NSGE) module, which is integrated with the Transformer through a parallel architecture and introduces a self-attention mechanism to enhance the spatial distribution of semantic features through the similarity between local and global features. The CNN representation capability of feature maps is improved to enhance the spatial distribution of semantic features within each feature semantic group. Secondly, we propose a margin contrast loss function to optimize the hash code parameters, which is used to improve the retrieval accuracy of the algorithm for multi-label datasets. Finally, extensive experiments on CIFAR-10, NUS-WIDE, and ImageNet datasets demonstrate that ICVT has superior retrieval performance compared with the currently popular algorithms, especially on CIFAR-10, where the mAP performance reaches 97.5%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Applied Soft Computing
Applied Soft Computing 工程技术-计算机:跨学科应用
CiteScore
15.80
自引率
6.90%
发文量
874
审稿时长
10.9 months
期刊介绍: Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities. Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信