Adaptable Adapters

N. Moosavi, Quentin Delfosse, K. Kersting, Iryna Gurevych
{"title":"Adaptable Adapters","authors":"N. Moosavi, Quentin Delfosse, K. Kersting, Iryna Gurevych","doi":"10.48550/arXiv.2205.01549","DOIUrl":null,"url":null,"abstract":"State-of-the-art pretrained NLP models contain a hundred million to trillion parameters. Adapters provide a parameter-efficient alternative for the full finetuning in which we can only finetune lightweight neural network layers on top of pretrained weights. Adapter layers are initialized randomly. However, existing work uses the same adapter architecture—i.e., the same adapter layer on top of each layer of the pretrained model—for every dataset, regardless of the properties of the dataset or the amount of available training data. In this work, we introduce adaptable adapters that contain (1) learning different activation functions for different layers and different input data, and (2) a learnable switch to select and only use the beneficial adapter layers. We show that adaptable adapters achieve on-par performances with the standard adapter architecture while using a considerably smaller number of adapter layers. In addition, we show that the selected adapter architecture by adaptable adapters transfers well across different data settings and similar tasks. We propose to use adaptable adapters for designing efficient and effective adapter architectures. The resulting adapters (a) contain about 50% of the learning parameters of the standard adapter and are therefore more efficient at training and inference, and require less storage space, and (b) achieve considerably higher performances in low-data settings.","PeriodicalId":382084,"journal":{"name":"North American Chapter of the Association for Computational Linguistics","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"North American Chapter of the Association for Computational Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2205.01549","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

State-of-the-art pretrained NLP models contain a hundred million to trillion parameters. Adapters provide a parameter-efficient alternative for the full finetuning in which we can only finetune lightweight neural network layers on top of pretrained weights. Adapter layers are initialized randomly. However, existing work uses the same adapter architecture—i.e., the same adapter layer on top of each layer of the pretrained model—for every dataset, regardless of the properties of the dataset or the amount of available training data. In this work, we introduce adaptable adapters that contain (1) learning different activation functions for different layers and different input data, and (2) a learnable switch to select and only use the beneficial adapter layers. We show that adaptable adapters achieve on-par performances with the standard adapter architecture while using a considerably smaller number of adapter layers. In addition, we show that the selected adapter architecture by adaptable adapters transfers well across different data settings and similar tasks. We propose to use adaptable adapters for designing efficient and effective adapter architectures. The resulting adapters (a) contain about 50% of the learning parameters of the standard adapter and are therefore more efficient at training and inference, and require less storage space, and (b) achieve considerably higher performances in low-data settings.
适应性强的适配器
最先进的预训练NLP模型包含1亿到1万亿个参数。适配器为完全微调提供了一种参数高效的替代方案,在这种方案中,我们只能在预训练权值的基础上微调轻量级神经网络层。适配器层随机初始化。然而,现有的工作使用相同的适配器体系结构,即。对于每个数据集,无论数据集的属性或可用训练数据的数量如何,在预训练模型的每一层之上都有相同的适配器层。在这项工作中,我们引入了自适应适配器,它包含(1)为不同的层和不同的输入数据学习不同的激活函数,以及(2)一个可学习的开关,以选择并仅使用有益的适配器层。我们展示了可适应适配器在使用相当少的适配器层数量的情况下,与标准适配器体系结构实现同等性能。此外,我们还展示了可适应适配器所选择的适配器体系结构可以很好地跨不同的数据设置和类似的任务进行传输。我们建议使用适应性适配器来设计高效和有效的适配器架构。由此产生的适配器(a)包含大约50%的标准适配器的学习参数,因此在训练和推理方面更有效,并且需要更少的存储空间,并且(b)在低数据设置中实现相当高的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信