{"title":"$$cal{Y}$ -Tuning:通过标签表示学习对大规模预训练模型进行高效调整的范例","authors":"Yitao Liu, Chenxin An, Xipeng Qiu","doi":"10.1007/s11704-023-3131-8","DOIUrl":null,"url":null,"abstract":"<p>With current success of large-scale pre-trained models (PTMs), how efficiently adapting PTMs to downstream tasks has attracted tremendous attention, especially for PTMs with billions of parameters. Previous work focuses on designing parameter-efficient tuning paradigms but needs to save and compute the gradient of the whole computational graph. In this paper, we propose <span>\\(\\cal{Y}\\)</span>-Tuning, an efficient yet effective paradigm to adapt frozen large-scale PTMs to specific downstream tasks. <span>\\(\\cal{Y}\\)</span>-Tuning learns dense representations for labels <span>\\(\\cal{Y}\\)</span> defined in a given task and aligns them to fixed feature representation. Without computing the gradients of text encoder at training phrase, <span>\\(\\cal{Y}\\)</span>-Tuning is not only parameter-efficient but also training-efficient. Experimental results show that for DeBERTa<sub>XXL</sub> with 1.6 billion parameters, <span>\\(\\cal{Y}\\)</span>-Tuning achieves performance more than 96% of full fine-tuning on GLUE Benchmark with only 2% tunable parameters and much fewer training costs.</p>","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":"38 1","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"$$\\\\cal{Y}$$ -Tuning: an efficient tuning paradigm for large-scale pre-trained models via label representation learning\",\"authors\":\"Yitao Liu, Chenxin An, Xipeng Qiu\",\"doi\":\"10.1007/s11704-023-3131-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>With current success of large-scale pre-trained models (PTMs), how efficiently adapting PTMs to downstream tasks has attracted tremendous attention, especially for PTMs with billions of parameters. Previous work focuses on designing parameter-efficient tuning paradigms but needs to save and compute the gradient of the whole computational graph. In this paper, we propose <span>\\\\(\\\\cal{Y}\\\\)</span>-Tuning, an efficient yet effective paradigm to adapt frozen large-scale PTMs to specific downstream tasks. <span>\\\\(\\\\cal{Y}\\\\)</span>-Tuning learns dense representations for labels <span>\\\\(\\\\cal{Y}\\\\)</span> defined in a given task and aligns them to fixed feature representation. Without computing the gradients of text encoder at training phrase, <span>\\\\(\\\\cal{Y}\\\\)</span>-Tuning is not only parameter-efficient but also training-efficient. Experimental results show that for DeBERTa<sub>XXL</sub> with 1.6 billion parameters, <span>\\\\(\\\\cal{Y}\\\\)</span>-Tuning achieves performance more than 96% of full fine-tuning on GLUE Benchmark with only 2% tunable parameters and much fewer training costs.</p>\",\"PeriodicalId\":12640,\"journal\":{\"name\":\"Frontiers of Computer Science\",\"volume\":\"38 1\",\"pages\":\"\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2023-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers of Computer Science\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11704-023-3131-8\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers of Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11704-023-3131-8","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
$$\cal{Y}$$ -Tuning: an efficient tuning paradigm for large-scale pre-trained models via label representation learning
With current success of large-scale pre-trained models (PTMs), how efficiently adapting PTMs to downstream tasks has attracted tremendous attention, especially for PTMs with billions of parameters. Previous work focuses on designing parameter-efficient tuning paradigms but needs to save and compute the gradient of the whole computational graph. In this paper, we propose \(\cal{Y}\)-Tuning, an efficient yet effective paradigm to adapt frozen large-scale PTMs to specific downstream tasks. \(\cal{Y}\)-Tuning learns dense representations for labels \(\cal{Y}\) defined in a given task and aligns them to fixed feature representation. Without computing the gradients of text encoder at training phrase, \(\cal{Y}\)-Tuning is not only parameter-efficient but also training-efficient. Experimental results show that for DeBERTaXXL with 1.6 billion parameters, \(\cal{Y}\)-Tuning achieves performance more than 96% of full fine-tuning on GLUE Benchmark with only 2% tunable parameters and much fewer training costs.
期刊介绍:
Frontiers of Computer Science aims to provide a forum for the publication of peer-reviewed papers to promote rapid communication and exchange between computer scientists. The journal publishes research papers and review articles in a wide range of topics, including: architecture, software, artificial intelligence, theoretical computer science, networks and communication, information systems, multimedia and graphics, information security, interdisciplinary, etc. The journal especially encourages papers from new emerging and multidisciplinary areas, as well as papers reflecting the international trends of research and development and on special topics reporting progress made by Chinese computer scientists.