Visual transformer-based image retrieval with multiple loss fusion

International Conference on Electronic Information Technology Pub Date : 2023-08-15 DOI:10.1117/12.2685738

Huayong Liu, Cong Huang, Hanjun Jin, Xiaosi Fu, Pei Shi

{"title":"Visual transformer-based image retrieval with multiple loss fusion","authors":"Huayong Liu, Cong Huang, Hanjun Jin, Xiaosi Fu, Pei Shi","doi":"10.1117/12.2685738","DOIUrl":null,"url":null,"abstract":"Through hash learning, the image retrieval based on deep hash algorithm encodes the image into a fixed length hash code for fast retrieval and matching. However, previous deep hash retrieval models based on convolutional neural networks extract local information of the image using pooling and convolution technology, which requires deeper networks to obtain long distance dependency, leading to high complexity and computation. In this paper, we propose a visual Transformer model based on self-attention to learn long dependencies of images and enhance the extraction ability of image features. Furthermore, a loss function with multiple loss fusion is proposed, which combines hash contrastive loss, classification loss, and quantization loss, to fully utilize image label information to improve the quality of hash coding by learning more potential semantic information. Experimental results demonstrate the superior performance of the proposed method over multiple classical deep hash retrieval methods based on CNN and two transformer-based hash retrieval methods, on two different datasets and different lengths of hash code.","PeriodicalId":305812,"journal":{"name":"International Conference on Electronic Information Technology","volume":"98 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Electronic Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2685738","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Through hash learning, the image retrieval based on deep hash algorithm encodes the image into a fixed length hash code for fast retrieval and matching. However, previous deep hash retrieval models based on convolutional neural networks extract local information of the image using pooling and convolution technology, which requires deeper networks to obtain long distance dependency, leading to high complexity and computation. In this paper, we propose a visual Transformer model based on self-attention to learn long dependencies of images and enhance the extraction ability of image features. Furthermore, a loss function with multiple loss fusion is proposed, which combines hash contrastive loss, classification loss, and quantization loss, to fully utilize image label information to improve the quality of hash coding by learning more potential semantic information. Experimental results demonstrate the superior performance of the proposed method over multiple classical deep hash retrieval methods based on CNN and two transformer-based hash retrieval methods, on two different datasets and different lengths of hash code.

查看原文本刊更多论文

基于视觉变换的多损失融合图像检索

基于深度哈希算法的图像检索通过哈希学习，将图像编码成固定长度的哈希码，便于快速检索和匹配。然而，以往基于卷积神经网络的深度哈希检索模型使用池化和卷积技术提取图像的局部信息，这需要更深层的网络获得长距离依赖，导致复杂度和计算量较高。本文提出了一种基于自注意的视觉Transformer模型来学习图像的长依赖关系，增强图像特征的提取能力。进一步，提出了一种多损失融合的损失函数，将哈希对比损失、分类损失和量化损失相结合，充分利用图像标签信息，通过学习更多潜在的语义信息来提高哈希编码的质量。实验结果表明，在不同的数据集和不同的哈希码长度下，该方法优于基于CNN的多种经典深度哈希检索方法和两种基于变换的哈希检索方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Conference on Electronic Information Technology

自引率

0.00%

发文量