Dynamic Kernel Selection for Improved Generalization and Memory Efficiency in Meta-learning

Arnav Chavan, Rishabh Tiwari, Udbhav Bamba, D. Gupta
{"title":"Dynamic Kernel Selection for Improved Generalization and Memory Efficiency in Meta-learning","authors":"Arnav Chavan, Rishabh Tiwari, Udbhav Bamba, D. Gupta","doi":"10.1109/CVPR52688.2022.00962","DOIUrl":null,"url":null,"abstract":"Gradient based meta-learning methods are prone to overfit on the meta-training set, and this behaviour is more prominent with large and complex networks. Moreover, large networks restrict the application of meta-learning models on low-power edge devices. While choosing smaller networks avoid these issues to a certain extent, it affects the overall generalization leading to reduced performance. Clearly, there is an approximately optimal choice of network architecture that is best suited for every meta-learning problem, however, identifying it beforehand is not straight-forward. In this paper, we present Metadock, a task-specific dynamic kernel selection strategy for designing compressed CNN models that generalize well on unseen tasks in meta-learning. Our method is based on the hypothesis that for a given set of similar tasks, not all kernels of the network are needed by each individual task. Rather, each task uses only a fraction of the kernels, and the selection of the kernels per task can be learnt dynamically as a part of the inner update steps. Metadockcompresses the meta-model as well as the task-specific inner models, thus providing significant reduction in model size for each task, and through constraining the number of active kernels for every task, it implicitly mitigates the issue of meta-overfitting. We show that for the same inference budget, pruned versions of large CNN models obtained using our approach consistently outperform the conventional choices of CNN models. Metadock couples well with popular meta-learning approaches such as iMAML [22]. The efficacy of our method is validated on CIFAR-fs [1] and mini-ImageNet [28] datasets, and we have observed that our approach can provide improvements in model accuracy of up to 2% on standard meta-learning benchmark, while reducing the model size by more than 75%. Our code is available at https://github.com/transmuteAI/MetaDOCK.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR52688.2022.00962","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Gradient based meta-learning methods are prone to overfit on the meta-training set, and this behaviour is more prominent with large and complex networks. Moreover, large networks restrict the application of meta-learning models on low-power edge devices. While choosing smaller networks avoid these issues to a certain extent, it affects the overall generalization leading to reduced performance. Clearly, there is an approximately optimal choice of network architecture that is best suited for every meta-learning problem, however, identifying it beforehand is not straight-forward. In this paper, we present Metadock, a task-specific dynamic kernel selection strategy for designing compressed CNN models that generalize well on unseen tasks in meta-learning. Our method is based on the hypothesis that for a given set of similar tasks, not all kernels of the network are needed by each individual task. Rather, each task uses only a fraction of the kernels, and the selection of the kernels per task can be learnt dynamically as a part of the inner update steps. Metadockcompresses the meta-model as well as the task-specific inner models, thus providing significant reduction in model size for each task, and through constraining the number of active kernels for every task, it implicitly mitigates the issue of meta-overfitting. We show that for the same inference budget, pruned versions of large CNN models obtained using our approach consistently outperform the conventional choices of CNN models. Metadock couples well with popular meta-learning approaches such as iMAML [22]. The efficacy of our method is validated on CIFAR-fs [1] and mini-ImageNet [28] datasets, and we have observed that our approach can provide improvements in model accuracy of up to 2% on standard meta-learning benchmark, while reducing the model size by more than 75%. Our code is available at https://github.com/transmuteAI/MetaDOCK.
动态核选择提高元学习泛化和记忆效率
基于梯度的元学习方法在元训练集上容易出现过拟合,这种行为在大型复杂网络中更为突出。此外,大型网络限制了元学习模型在低功耗边缘设备上的应用。虽然选择较小的网络在一定程度上避免了这些问题,但它会影响整体泛化,导致性能下降。显然,存在一个最适合每个元学习问题的网络架构的近似最佳选择,然而,事先确定它并不是直截了当地的。在本文中,我们提出了Metadock,一种特定于任务的动态核选择策略,用于设计压缩CNN模型,该模型可以很好地泛化元学习中不可见的任务。我们的方法是基于这样一个假设:对于一组给定的类似任务,并不是每个单独的任务都需要网络的所有核。相反,每个任务只使用一小部分内核,并且每个任务的内核选择可以作为内部更新步骤的一部分动态学习。metadockk压缩了元模型以及特定于任务的内部模型,从而大大减少了每个任务的模型大小,并且通过限制每个任务的活动内核数量,它隐含地减轻了元过拟合的问题。我们表明,对于相同的推理预算,使用我们的方法获得的大型CNN模型的修剪版本始终优于CNN模型的传统选择。Metadock与流行的元学习方法(如iMAML)很好地结合在一起[22]。我们的方法的有效性在CIFAR-fs[1]和mini-ImageNet[28]数据集上得到了验证,我们已经观察到,我们的方法可以在标准元学习基准上将模型精度提高2%,同时将模型大小减少75%以上。我们的代码可在https://github.com/transmuteAI/MetaDOCK上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信