Task-Adaptive Multi-Source Representations for Few-Shot Image Recognition

Information Pub Date : 2024-05-21 DOI:10.3390/info15060293

Ge Liu, Zhongqiang Zhang, Xiangzhong Fang

{"title":"Task-Adaptive Multi-Source Representations for Few-Shot Image Recognition","authors":"Ge Liu, Zhongqiang Zhang, Xiangzhong Fang","doi":"10.3390/info15060293","DOIUrl":null,"url":null,"abstract":"Conventional few-shot learning (FSL) mainly focuses on knowledge transfer from a single source dataset to a recognition scenario with only a few training samples available but still similar to the source domain. In this paper, we consider a more practical FSL setting where multiple semantically different datasets are available to address a wide range of FSL tasks, especially for some recognition scenarios beyond natural images, such as remote sensing and medical imagery. It can be referred to as multi-source cross-domain FSL. To tackle the problem, we propose a two-stage learning scheme, termed learning and adapting multi-source representations (LAMR). In the first stage, we propose a multi-head network to obtain efficient multi-domain representations, where all source domains share the same backbone except for the last parallel projection layers for domain specialization. We train the representations in a multi-task setting where each in-domain classification task is taken by a cosine classifier. In the second stage, considering that instance discrimination and class discrimination are crucial for robust recognition, we propose two contrastive objectives for adapting the pre-trained representations to be task-specialized on the few-shot data. Careful ablation studies verify that LAMR significantly improves representation transferability, showing consistent performance boosts. We also extend LAMR to single-source FSL by introducing a dataset-splitting strategy that equally splits one source dataset into sub-domains. The empirical results show that LAMR can achieve SOTA performance on the BSCD-FSL benchmark and competitive performance on mini-ImageNet, highlighting its versatility and effectiveness for FSL of both natural and specific imaging.","PeriodicalId":510156,"journal":{"name":"Information","volume":"122 30","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/info15060293","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Conventional few-shot learning (FSL) mainly focuses on knowledge transfer from a single source dataset to a recognition scenario with only a few training samples available but still similar to the source domain. In this paper, we consider a more practical FSL setting where multiple semantically different datasets are available to address a wide range of FSL tasks, especially for some recognition scenarios beyond natural images, such as remote sensing and medical imagery. It can be referred to as multi-source cross-domain FSL. To tackle the problem, we propose a two-stage learning scheme, termed learning and adapting multi-source representations (LAMR). In the first stage, we propose a multi-head network to obtain efficient multi-domain representations, where all source domains share the same backbone except for the last parallel projection layers for domain specialization. We train the representations in a multi-task setting where each in-domain classification task is taken by a cosine classifier. In the second stage, considering that instance discrimination and class discrimination are crucial for robust recognition, we propose two contrastive objectives for adapting the pre-trained representations to be task-specialized on the few-shot data. Careful ablation studies verify that LAMR significantly improves representation transferability, showing consistent performance boosts. We also extend LAMR to single-source FSL by introducing a dataset-splitting strategy that equally splits one source dataset into sub-domains. The empirical results show that LAMR can achieve SOTA performance on the BSCD-FSL benchmark and competitive performance on mini-ImageNet, highlighting its versatility and effectiveness for FSL of both natural and specific imaging.

查看原文本刊更多论文

用于少量图像识别的任务自适应多源表示法

传统的少量学习（FSL）主要侧重于将知识从单一源数据集转移到只有少量训练样本但仍与源领域相似的识别场景中。在本文中，我们将考虑一种更实用的 FSL 设置，即多个语义不同的数据集可用于解决各种 FSL 任务，尤其是自然图像之外的一些识别场景，如遥感和医学图像。这可以称为多源跨域 FSL。为了解决这个问题，我们提出了一个分为两个阶段的学习方案，称为多源表征学习与适应（LAMR）。在第一阶段，我们提出了一种多头网络，以获得高效的多域表征，其中除了用于域特化的最后一个并行投影层外，所有源域都共享相同的骨干层。我们在多任务设置中训练表示法，其中每个域内分类任务由一个余弦分类器承担。在第二阶段，考虑到实例辨别和类别辨别对于稳健识别至关重要，我们提出了两个对比目标，用于调整预训练表征，使其在少量数据上实现任务专用化。仔细的消融研究验证了 LAMR 能显著提高表征的可转移性，并显示出持续的性能提升。我们还引入了数据集分割策略，将一个源数据集平均分割成子域，从而将 LAMR 扩展到单源 FSL。实证结果表明，LAMR 可以在 BSCD-FSL 基准上实现 SOTA 性能，并在 mini-ImageNet 上实现具有竞争力的性能，这凸显了它在自然和特定成像的 FSL 方面的通用性和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information

自引率

0.00%

发文量