基于部件交互和度量学习的 Logit 变量产品量化与知识提炼，用于细粒度图像检索

IF 8.4 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Multimedia Pub Date : 2024-06-03 DOI:10.1109/TMM.2024.3407661

Lei Ma;Xin Luo;Hanyu Hong;Fanman Meng;Qingbo Wu

{"title":"基于部件交互和度量学习的 Logit 变量产品量化与知识提炼，用于细粒度图像检索","authors":"Lei Ma;Xin Luo;Hanyu Hong;Fanman Meng;Qingbo Wu","doi":"10.1109/TMM.2024.3407661","DOIUrl":null,"url":null,"abstract":"Image retrieval with fine-grained categories is an extremely challenging task due to the high intraclass variance and low interclass variance. Most previous works have focused on localizing discriminative image regions in isolation, but have rarely exploited correlations across the different discriminative regions to alleviate intraclass differences. In addition, the intraclass compactness of embedding features is ensured by extra regularization terms that only exist during the training phase, which appear to generalize less well in the inference phase. Finally, the information granularity of the distance measure should distinguish subtle visual differences and the correlation between the embedding features and the quantized features should be maximized sufficiently. To address the above issues, we propose a logit variated product quantization method based on part interaction and metric learning with knowledge distillation for fine-grained image retrieval. Specifically, we introduce a causal context module into the deep navigator to generate discriminative regions and utilize a channelwise cross-part fusion transformer to model the part correlations while alleviating intraclass differences. Subsequently, we design a logit variation module based on a weighted sum scheme to further reduce the intraclass variance of the embedding features directly and enhance the learning power of the quantization model. Finally, we propose a novel product quantization loss based on metric learning and knowledge distillation to enhance the correlation between the embedding features and the quantized features and allow the quantization features to learn more knowledge from the embedding features. The experimental results on several fine-grained datasets demonstrate that the proposed method is superior to state-of-the-art fine-grained image retrieval methods.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"10406-10419"},"PeriodicalIF":8.4000,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Logit Variated Product Quantization Based on Parts Interaction and Metric Learning With Knowledge Distillation for Fine-Grained Image Retrieval\",\"authors\":\"Lei Ma;Xin Luo;Hanyu Hong;Fanman Meng;Qingbo Wu\",\"doi\":\"10.1109/TMM.2024.3407661\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Image retrieval with fine-grained categories is an extremely challenging task due to the high intraclass variance and low interclass variance. Most previous works have focused on localizing discriminative image regions in isolation, but have rarely exploited correlations across the different discriminative regions to alleviate intraclass differences. In addition, the intraclass compactness of embedding features is ensured by extra regularization terms that only exist during the training phase, which appear to generalize less well in the inference phase. Finally, the information granularity of the distance measure should distinguish subtle visual differences and the correlation between the embedding features and the quantized features should be maximized sufficiently. To address the above issues, we propose a logit variated product quantization method based on part interaction and metric learning with knowledge distillation for fine-grained image retrieval. Specifically, we introduce a causal context module into the deep navigator to generate discriminative regions and utilize a channelwise cross-part fusion transformer to model the part correlations while alleviating intraclass differences. Subsequently, we design a logit variation module based on a weighted sum scheme to further reduce the intraclass variance of the embedding features directly and enhance the learning power of the quantization model. Finally, we propose a novel product quantization loss based on metric learning and knowledge distillation to enhance the correlation between the embedding features and the quantized features and allow the quantization features to learn more knowledge from the embedding features. The experimental results on several fine-grained datasets demonstrate that the proposed method is superior to state-of-the-art fine-grained image retrieval methods.\",\"PeriodicalId\":13273,\"journal\":{\"name\":\"IEEE Transactions on Multimedia\",\"volume\":\"26 \",\"pages\":\"10406-10419\"},\"PeriodicalIF\":8.4000,\"publicationDate\":\"2024-06-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multimedia\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10546268/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10546268/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

由于类内差异大而类间差异小，细粒度类别的图像检索是一项极具挑战性的任务。以往的大多数研究都侧重于孤立地定位图像的分辨区域，但很少利用不同分辨区域之间的相关性来减轻类内差异。此外，嵌入特征的类内紧凑性是通过额外的正则化项来保证的，而这些正则化项只存在于训练阶段，在推理阶段的泛化效果似乎较差。最后，距离度量的信息粒度应能区分细微的视觉差异，嵌入特征与量化特征之间的相关性应充分最大化。针对上述问题，我们提出了一种基于部分交互和度量学习的 logit 变积量化方法，并将其用于细粒度图像检索的知识提炼。具体来说，我们在深度导航器中引入了一个因果上下文模块，以生成具有区分性的区域，并利用通道式跨部件融合转换器来建立部件相关性模型，同时减轻类内差异。随后，我们设计了一个基于加权和方案的 logit 变异模块，进一步直接降低嵌入特征的类内方差，增强量化模型的学习能力。最后，我们提出了一种基于度量学习和知识提炼的新型乘积量化损失，以增强嵌入特征与量化特征之间的相关性，让量化特征从嵌入特征中学习更多知识。在多个细粒度数据集上的实验结果表明，所提出的方法优于最先进的细粒度图像检索方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Logit Variated Product Quantization Based on Parts Interaction and Metric Learning With Knowledge Distillation for Fine-Grained Image Retrieval

Image retrieval with fine-grained categories is an extremely challenging task due to the high intraclass variance and low interclass variance. Most previous works have focused on localizing discriminative image regions in isolation, but have rarely exploited correlations across the different discriminative regions to alleviate intraclass differences. In addition, the intraclass compactness of embedding features is ensured by extra regularization terms that only exist during the training phase, which appear to generalize less well in the inference phase. Finally, the information granularity of the distance measure should distinguish subtle visual differences and the correlation between the embedding features and the quantized features should be maximized sufficiently. To address the above issues, we propose a logit variated product quantization method based on part interaction and metric learning with knowledge distillation for fine-grained image retrieval. Specifically, we introduce a causal context module into the deep navigator to generate discriminative regions and utilize a channelwise cross-part fusion transformer to model the part correlations while alleviating intraclass differences. Subsequently, we design a logit variation module based on a weighted sum scheme to further reduce the intraclass variance of the embedding features directly and enhance the learning power of the quantization model. Finally, we propose a novel product quantization loss based on metric learning and knowledge distillation to enhance the correlation between the embedding features and the quantized features and allow the quantization features to learn more knowledge from the embedding features. The experimental results on several fine-grained datasets demonstrate that the proposed method is superior to state-of-the-art fine-grained image retrieval methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.