{"title":"Boosting meta-training with base class information for robust few-shot learning","authors":"Weihao Jiang , Guodong Liu , Di He , Kun He","doi":"10.1016/j.engappai.2025.110780","DOIUrl":null,"url":null,"abstract":"<div><div>Few-shot learning aims to develop classifiers that can recognize new, unseen classes with only a few labeled examples. Meta-learning, with methods like Model-Agnostic Meta-Learning (MAML) and Prototypical Networks, has emerged as a key approach for this challenge. A recent advancement, Meta-Baseline, utilizes sequential pre-training and meta-training to achieve state-of-the-art performance. However, during meta-training, designed to further adapt the model to few-shot classification tasks, the class transferability gained during pre-training can be compromised, leading to suboptimal performance. We propose an end-to-end training paradigm with two alternating loops. The outer loop calculates cross-entropy loss across the entire training set while updating only the final linear layer. The inner loop uses the original meta-learning method to calculate the loss and integrate gradients from the outer loss to guide parameter updates. This approach enables effective adaptation to few-shot learning tasks while preserving robust generalization, outperforming existing baselines. Our findings suggest that leveraging information from the entire training set and the meta-learning training paradigm could mutually enhance one another. Our extensive experiments on the <span><math><mrow><mi>m</mi><mi>i</mi><mi>n</mi><mi>i</mi></mrow></math></span>ImageNet, <span><math><mrow><mi>t</mi><mi>i</mi><mi>e</mi><mi>r</mi><mi>e</mi><mi>d</mi></mrow></math></span>ImageNet, and CUB datasets demonstrate the effectiveness of our method. In the 5-way 1-shot setting, we achieve accuracies of 64.01%, 69.73%, and 68.07%, outperforming the best baselines by 0.94%, 1.11%, and 1.94%, respectively. In the 5-way 5-shot setting, our accuracies are 81.00%, 84.91%, and 83.44%, outperforming the best baselines by 1.74%, 1.17%, and 3.37%, respectively. Additionally, our framework is model-agnostic, yielding significant performance improvements when integrated with existing baseline training frameworks, providing an approximate 1% performance boost.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"152 ","pages":"Article 110780"},"PeriodicalIF":8.0000,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625007808","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Few-shot learning aims to develop classifiers that can recognize new, unseen classes with only a few labeled examples. Meta-learning, with methods like Model-Agnostic Meta-Learning (MAML) and Prototypical Networks, has emerged as a key approach for this challenge. A recent advancement, Meta-Baseline, utilizes sequential pre-training and meta-training to achieve state-of-the-art performance. However, during meta-training, designed to further adapt the model to few-shot classification tasks, the class transferability gained during pre-training can be compromised, leading to suboptimal performance. We propose an end-to-end training paradigm with two alternating loops. The outer loop calculates cross-entropy loss across the entire training set while updating only the final linear layer. The inner loop uses the original meta-learning method to calculate the loss and integrate gradients from the outer loss to guide parameter updates. This approach enables effective adaptation to few-shot learning tasks while preserving robust generalization, outperforming existing baselines. Our findings suggest that leveraging information from the entire training set and the meta-learning training paradigm could mutually enhance one another. Our extensive experiments on the ImageNet, ImageNet, and CUB datasets demonstrate the effectiveness of our method. In the 5-way 1-shot setting, we achieve accuracies of 64.01%, 69.73%, and 68.07%, outperforming the best baselines by 0.94%, 1.11%, and 1.94%, respectively. In the 5-way 5-shot setting, our accuracies are 81.00%, 84.91%, and 83.44%, outperforming the best baselines by 1.74%, 1.17%, and 3.37%, respectively. Additionally, our framework is model-agnostic, yielding significant performance improvements when integrated with existing baseline training frameworks, providing an approximate 1% performance boost.
few -shot学习的目标是开发分类器,使其能够识别只有少数标记示例的新的、未见过的类。元学习方法,如模型不可知元学习(MAML)和原型网络,已经成为应对这一挑战的关键方法。最近的一项进步,元基线,利用连续的预训练和元训练来实现最先进的性能。然而,在元训练期间,为了进一步使模型适应较少的分类任务,在预训练期间获得的类可转移性可能会受到损害,导致次优性能。我们提出了一个端到端的训练范式,具有两个交替的循环。外环计算整个训练集的交叉熵损失,同时只更新最后的线性层。内环使用原始的元学习方法计算损失,并从外部损失中积分梯度来指导参数更新。这种方法能够有效地适应少量的学习任务,同时保持鲁棒的泛化,优于现有的基线。我们的研究结果表明,利用整个训练集的信息和元学习训练范式可以相互促进。我们在miniImageNet、tieredImageNet和CUB数据集上的大量实验证明了我们方法的有效性。在5种方式的1次射击设置中,我们的准确率分别为64.01%,69.73%和68.07%,分别比最佳基线高出0.94%,1.11%和1.94%。在5-way 5-shot设置下,我们的准确率分别为81.00%、84.91%和83.44%,分别比最佳基线高1.74%、1.17%和3.37%。此外,我们的框架是模型无关的,当与现有的基线训练框架集成时,可以产生显着的性能改进,提供大约1%的性能提升。
期刊介绍:
Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.