Dynamic Kernel Selection for Improved Generalization and Memory Efficiency in Meta-learning

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2022-06-01 DOI:10.1109/CVPR52688.2022.00962

Arnav Chavan, Rishabh Tiwari, Udbhav Bamba, D. Gupta

{"title":"Dynamic Kernel Selection for Improved Generalization and Memory Efficiency in Meta-learning","authors":"Arnav Chavan, Rishabh Tiwari, Udbhav Bamba, D. Gupta","doi":"10.1109/CVPR52688.2022.00962","DOIUrl":null,"url":null,"abstract":"Gradient based meta-learning methods are prone to overfit on the meta-training set, and this behaviour is more prominent with large and complex networks. Moreover, large networks restrict the application of meta-learning models on low-power edge devices. While choosing smaller networks avoid these issues to a certain extent, it affects the overall generalization leading to reduced performance. Clearly, there is an approximately optimal choice of network architecture that is best suited for every meta-learning problem, however, identifying it beforehand is not straight-forward. In this paper, we present Metadock, a task-specific dynamic kernel selection strategy for designing compressed CNN models that generalize well on unseen tasks in meta-learning. Our method is based on the hypothesis that for a given set of similar tasks, not all kernels of the network are needed by each individual task. Rather, each task uses only a fraction of the kernels, and the selection of the kernels per task can be learnt dynamically as a part of the inner update steps. Metadockcompresses the meta-model as well as the task-specific inner models, thus providing significant reduction in model size for each task, and through constraining the number of active kernels for every task, it implicitly mitigates the issue of meta-overfitting. We show that for the same inference budget, pruned versions of large CNN models obtained using our approach consistently outperform the conventional choices of CNN models. Metadock couples well with popular meta-learning approaches such as iMAML [22]. The efficacy of our method is validated on CIFAR-fs [1] and mini-ImageNet [28] datasets, and we have observed that our approach can provide improvements in model accuracy of up to 2% on standard meta-learning benchmark, while reducing the model size by more than 75%. Our code is available at https://github.com/transmuteAI/MetaDOCK.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR52688.2022.00962","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Gradient based meta-learning methods are prone to overfit on the meta-training set, and this behaviour is more prominent with large and complex networks. Moreover, large networks restrict the application of meta-learning models on low-power edge devices. While choosing smaller networks avoid these issues to a certain extent, it affects the overall generalization leading to reduced performance. Clearly, there is an approximately optimal choice of network architecture that is best suited for every meta-learning problem, however, identifying it beforehand is not straight-forward. In this paper, we present Metadock, a task-specific dynamic kernel selection strategy for designing compressed CNN models that generalize well on unseen tasks in meta-learning. Our method is based on the hypothesis that for a given set of similar tasks, not all kernels of the network are needed by each individual task. Rather, each task uses only a fraction of the kernels, and the selection of the kernels per task can be learnt dynamically as a part of the inner update steps. Metadockcompresses the meta-model as well as the task-specific inner models, thus providing significant reduction in model size for each task, and through constraining the number of active kernels for every task, it implicitly mitigates the issue of meta-overfitting. We show that for the same inference budget, pruned versions of large CNN models obtained using our approach consistently outperform the conventional choices of CNN models. Metadock couples well with popular meta-learning approaches such as iMAML [22]. The efficacy of our method is validated on CIFAR-fs [1] and mini-ImageNet [28] datasets, and we have observed that our approach can provide improvements in model accuracy of up to 2% on standard meta-learning benchmark, while reducing the model size by more than 75%. Our code is available at https://github.com/transmuteAI/MetaDOCK.

查看原文本刊更多论文

动态核选择提高元学习泛化和记忆效率

基于梯度的元学习方法在元训练集上容易出现过拟合，这种行为在大型复杂网络中更为突出。此外，大型网络限制了元学习模型在低功耗边缘设备上的应用。虽然选择较小的网络在一定程度上避免了这些问题，但它会影响整体泛化，导致性能下降。显然，存在一个最适合每个元学习问题的网络架构的近似最佳选择，然而，事先确定它并不是直截了当地的。在本文中，我们提出了Metadock，一种特定于任务的动态核选择策略，用于设计压缩CNN模型，该模型可以很好地泛化元学习中不可见的任务。我们的方法是基于这样一个假设:对于一组给定的类似任务，并不是每个单独的任务都需要网络的所有核。相反，每个任务只使用一小部分内核，并且每个任务的内核选择可以作为内部更新步骤的一部分动态学习。metadockk压缩了元模型以及特定于任务的内部模型，从而大大减少了每个任务的模型大小，并且通过限制每个任务的活动内核数量，它隐含地减轻了元过拟合的问题。我们表明，对于相同的推理预算，使用我们的方法获得的大型CNN模型的修剪版本始终优于CNN模型的传统选择。Metadock与流行的元学习方法(如iMAML)很好地结合在一起[22]。我们的方法的有效性在CIFAR-fs[1]和mini-ImageNet[28]数据集上得到了验证，我们已经观察到，我们的方法可以在标准元学习基准上将模型精度提高2%，同时将模型大小减少75%以上。我们的代码可在https://github.com/transmuteAI/MetaDOCK上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量