Adaptable Adapters

North American Chapter of the Association for Computational Linguistics Pub Date : 2022-05-03 DOI:10.48550/arXiv.2205.01549

N. Moosavi, Quentin Delfosse, K. Kersting, Iryna Gurevych

{"title":"Adaptable Adapters","authors":"N. Moosavi, Quentin Delfosse, K. Kersting, Iryna Gurevych","doi":"10.48550/arXiv.2205.01549","DOIUrl":null,"url":null,"abstract":"State-of-the-art pretrained NLP models contain a hundred million to trillion parameters. Adapters provide a parameter-efficient alternative for the full finetuning in which we can only finetune lightweight neural network layers on top of pretrained weights. Adapter layers are initialized randomly. However, existing work uses the same adapter architecture—i.e., the same adapter layer on top of each layer of the pretrained model—for every dataset, regardless of the properties of the dataset or the amount of available training data. In this work, we introduce adaptable adapters that contain (1) learning different activation functions for different layers and different input data, and (2) a learnable switch to select and only use the beneficial adapter layers. We show that adaptable adapters achieve on-par performances with the standard adapter architecture while using a considerably smaller number of adapter layers. In addition, we show that the selected adapter architecture by adaptable adapters transfers well across different data settings and similar tasks. We propose to use adaptable adapters for designing efficient and effective adapter architectures. The resulting adapters (a) contain about 50% of the learning parameters of the standard adapter and are therefore more efficient at training and inference, and require less storage space, and (b) achieve considerably higher performances in low-data settings.","PeriodicalId":382084,"journal":{"name":"North American Chapter of the Association for Computational Linguistics","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"North American Chapter of the Association for Computational Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2205.01549","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

State-of-the-art pretrained NLP models contain a hundred million to trillion parameters. Adapters provide a parameter-efficient alternative for the full finetuning in which we can only finetune lightweight neural network layers on top of pretrained weights. Adapter layers are initialized randomly. However, existing work uses the same adapter architecture—i.e., the same adapter layer on top of each layer of the pretrained model—for every dataset, regardless of the properties of the dataset or the amount of available training data. In this work, we introduce adaptable adapters that contain (1) learning different activation functions for different layers and different input data, and (2) a learnable switch to select and only use the beneficial adapter layers. We show that adaptable adapters achieve on-par performances with the standard adapter architecture while using a considerably smaller number of adapter layers. In addition, we show that the selected adapter architecture by adaptable adapters transfers well across different data settings and similar tasks. We propose to use adaptable adapters for designing efficient and effective adapter architectures. The resulting adapters (a) contain about 50% of the learning parameters of the standard adapter and are therefore more efficient at training and inference, and require less storage space, and (b) achieve considerably higher performances in low-data settings.

查看原文本刊更多论文

适应性强的适配器

最先进的预训练NLP模型包含1亿到1万亿个参数。适配器为完全微调提供了一种参数高效的替代方案，在这种方案中，我们只能在预训练权值的基础上微调轻量级神经网络层。适配器层随机初始化。然而，现有的工作使用相同的适配器体系结构，即。对于每个数据集，无论数据集的属性或可用训练数据的数量如何，在预训练模型的每一层之上都有相同的适配器层。在这项工作中，我们引入了自适应适配器，它包含(1)为不同的层和不同的输入数据学习不同的激活函数，以及(2)一个可学习的开关，以选择并仅使用有益的适配器层。我们展示了可适应适配器在使用相当少的适配器层数量的情况下，与标准适配器体系结构实现同等性能。此外，我们还展示了可适应适配器所选择的适配器体系结构可以很好地跨不同的数据设置和类似的任务进行传输。我们建议使用适应性适配器来设计高效和有效的适配器架构。由此产生的适配器(a)包含大约50%的标准适配器的学习参数，因此在训练和推理方面更有效，并且需要更少的存储空间，并且(b)在低数据设置中实现相当高的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

North American Chapter of the Association for Computational Linguistics

自引率

0.00%

发文量