{"title":"基于CNN和视觉变压器网络的深度哈希图像检索","authors":"Shuli Cheng , Xingming Xiao , Liejun Wang","doi":"10.1016/j.asoc.2025.113244","DOIUrl":null,"url":null,"abstract":"<div><div>Deep hashing technology can achieve unified representation of multimedia technology, which is widely used in fields such as smart agriculture, smart transportation, and public safety. However, the current deep hashing methods cannot achieve efficient fusion of global and local information, and the learning ability of hash code representation needs to be further strengthened. In this paper, we propose a deep hashing retrieval algorithm based on integrated CNN and visual Transformer(ICVT) network. Firstly, we propose a lightweight nonlinear spatial group enhancement (NSGE) module, which is integrated with the Transformer through a parallel architecture and introduces a self-attention mechanism to enhance the spatial distribution of semantic features through the similarity between local and global features. The CNN representation capability of feature maps is improved to enhance the spatial distribution of semantic features within each feature semantic group. Secondly, we propose a margin contrast loss function to optimize the hash code parameters, which is used to improve the retrieval accuracy of the algorithm for multi-label datasets. Finally, extensive experiments on CIFAR-10, NUS-WIDE, and ImageNet datasets demonstrate that ICVT has superior retrieval performance compared with the currently popular algorithms, especially on CIFAR-10, where the mAP performance reaches 97.5%.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"177 ","pages":"Article 113244"},"PeriodicalIF":7.2000,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep hashing image retrieval based on CNN and visual transformer network\",\"authors\":\"Shuli Cheng , Xingming Xiao , Liejun Wang\",\"doi\":\"10.1016/j.asoc.2025.113244\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Deep hashing technology can achieve unified representation of multimedia technology, which is widely used in fields such as smart agriculture, smart transportation, and public safety. However, the current deep hashing methods cannot achieve efficient fusion of global and local information, and the learning ability of hash code representation needs to be further strengthened. In this paper, we propose a deep hashing retrieval algorithm based on integrated CNN and visual Transformer(ICVT) network. Firstly, we propose a lightweight nonlinear spatial group enhancement (NSGE) module, which is integrated with the Transformer through a parallel architecture and introduces a self-attention mechanism to enhance the spatial distribution of semantic features through the similarity between local and global features. The CNN representation capability of feature maps is improved to enhance the spatial distribution of semantic features within each feature semantic group. Secondly, we propose a margin contrast loss function to optimize the hash code parameters, which is used to improve the retrieval accuracy of the algorithm for multi-label datasets. Finally, extensive experiments on CIFAR-10, NUS-WIDE, and ImageNet datasets demonstrate that ICVT has superior retrieval performance compared with the currently popular algorithms, especially on CIFAR-10, where the mAP performance reaches 97.5%.</div></div>\",\"PeriodicalId\":50737,\"journal\":{\"name\":\"Applied Soft Computing\",\"volume\":\"177 \",\"pages\":\"Article 113244\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2025-05-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Soft Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1568494625005551\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494625005551","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Deep hashing image retrieval based on CNN and visual transformer network
Deep hashing technology can achieve unified representation of multimedia technology, which is widely used in fields such as smart agriculture, smart transportation, and public safety. However, the current deep hashing methods cannot achieve efficient fusion of global and local information, and the learning ability of hash code representation needs to be further strengthened. In this paper, we propose a deep hashing retrieval algorithm based on integrated CNN and visual Transformer(ICVT) network. Firstly, we propose a lightweight nonlinear spatial group enhancement (NSGE) module, which is integrated with the Transformer through a parallel architecture and introduces a self-attention mechanism to enhance the spatial distribution of semantic features through the similarity between local and global features. The CNN representation capability of feature maps is improved to enhance the spatial distribution of semantic features within each feature semantic group. Secondly, we propose a margin contrast loss function to optimize the hash code parameters, which is used to improve the retrieval accuracy of the algorithm for multi-label datasets. Finally, extensive experiments on CIFAR-10, NUS-WIDE, and ImageNet datasets demonstrate that ICVT has superior retrieval performance compared with the currently popular algorithms, especially on CIFAR-10, where the mAP performance reaches 97.5%.
期刊介绍:
Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities.
Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.