Dual-View Data Hallucination With Semantic Relation Guidance for Few-Shot Image Recognition

IF 8.4 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Multimedia Pub Date : 2024-09-02 DOI:10.1109/TMM.2024.3453055

Hefeng Wu;Guangzhi Ye;Ziyang Zhou;Ling Tian;Qing Wang;Liang Lin

{"title":"Dual-View Data Hallucination With Semantic Relation Guidance for Few-Shot Image Recognition","authors":"Hefeng Wu;Guangzhi Ye;Ziyang Zhou;Ling Tian;Qing Wang;Liang Lin","doi":"10.1109/TMM.2024.3453055","DOIUrl":null,"url":null,"abstract":"Learning to recognize novel concepts from just a few image samples is very challenging as the learned model is easily overfitted on the few data and results in poor generalizability. One promising but underexplored solution is to compensate for the novel classes by generating plausible samples. However, most existing works of this line exploit visual information only, rendering the generated data easy to be distracted by some challenging factors contained in the few available samples. Being aware of the semantic information in the textual modality that reflects human concepts, this work proposes a novel framework that exploits semantic relations to guide dual-view data hallucination for few-shot image recognition. The proposed framework enables generating more diverse and reasonable data samples for novel classes through effective information transfer from base classes. Specifically, an instance-view data hallucination module hallucinates each sample of a novel class to generate new data by employing local semantic correlated attention and global semantic feature fusion derived from base classes. Meanwhile, a prototype-view data hallucination module exploits semantic-aware measure to estimate the prototype of a novel class and the associated distribution from the few samples, which thereby harvests the prototype as a more stable sample and enables resampling a large number of samples. We conduct extensive experiments and comparisons with state-of-the-art methods on several popular few-shot benchmarks to verify the effectiveness of the proposed framework.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"11302-11315"},"PeriodicalIF":8.4000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10663278/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Learning to recognize novel concepts from just a few image samples is very challenging as the learned model is easily overfitted on the few data and results in poor generalizability. One promising but underexplored solution is to compensate for the novel classes by generating plausible samples. However, most existing works of this line exploit visual information only, rendering the generated data easy to be distracted by some challenging factors contained in the few available samples. Being aware of the semantic information in the textual modality that reflects human concepts, this work proposes a novel framework that exploits semantic relations to guide dual-view data hallucination for few-shot image recognition. The proposed framework enables generating more diverse and reasonable data samples for novel classes through effective information transfer from base classes. Specifically, an instance-view data hallucination module hallucinates each sample of a novel class to generate new data by employing local semantic correlated attention and global semantic feature fusion derived from base classes. Meanwhile, a prototype-view data hallucination module exploits semantic-aware measure to estimate the prototype of a novel class and the associated distribution from the few samples, which thereby harvests the prototype as a more stable sample and enables resampling a large number of samples. We conduct extensive experiments and comparisons with state-of-the-art methods on several popular few-shot benchmarks to verify the effectiveness of the proposed framework.

查看原文本刊更多论文

利用语义关系指导双视图数据幻象，实现少镜头图像识别

从少量图像样本中学习识别新概念非常具有挑战性，因为学习到的模型很容易对少量数据过度拟合，导致普适性差。一种前景广阔但尚未得到充分探索的解决方案是通过生成可信样本来补偿新类别。然而，大多数现有的相关工作都只利用了视觉信息，使得生成的数据很容易被少数可用样本中包含的一些挑战性因素所干扰。考虑到文本模式中的语义信息反映了人类的概念，这项工作提出了一个新颖的框架，利用语义关系来指导双视角数据幻化，从而实现少镜头图像识别。所提出的框架能通过有效的基础类信息转移，为新类别生成更多样、更合理的数据样本。具体来说，实例视图数据幻化模块通过局部语义相关注意和全局语义特征融合，对新类别的每个样本进行幻化，生成新数据。同时，原型视图数据幻象模块利用语义感知措施，从少量样本中估计出新类别的原型和相关分布，从而获得作为更稳定样本的原型，并实现对大量样本的重新采样。我们在几个流行的少量样本基准上进行了大量实验，并与最先进的方法进行了比较，以验证所提框架的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.