可进行宽度和深度动态推断的灵活 BERT 模型

IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Ting Hu, Christoph Meinel, Haojin Yang
{"title":"可进行宽度和深度动态推断的灵活 BERT 模型","authors":"Ting Hu,&nbsp;Christoph Meinel,&nbsp;Haojin Yang","doi":"10.1016/j.csl.2024.101646","DOIUrl":null,"url":null,"abstract":"<div><p>Fine-tuning and inference on Large Language Models like BERT have become increasingly expensive regarding memory cost and computation resources. The recently proposed computation-flexible BERT models facilitate their deployment in varied computational environments. Training such flexible BERT models involves jointly optimizing multiple BERT subnets, which will unavoidably interfere with one another. Besides, the performance of large subnets is limited by the performance gap between the smallest subnet and the supernet, despite efforts to enhance the smaller subnets. In this regard, we propose layer-wise Neural grafting to boost BERT subnets, especially the larger ones. The proposed method improves the average performance of the subnets on six GLUE tasks and boosts the supernets on all GLUE tasks and the SQuAD data set. Based on the boosted subnets, we further build an inference framework enabling practical width- and depth-dynamic inference regarding different inputs by combining width-dynamic gating modules and early exit off-ramps in the depth dimension. Experimental results show that the proposed framework achieves a better dynamic inference range than other methods in terms of trading off performance and computational complexity on four GLUE tasks and SQuAD. In particular, our best-tradeoff inference result outperforms other fixed-size models with similar amount of computations. Compared to BERT-Base, the proposed inference framework yields a 1.3-point improvement in the average GLUE score and a 2.2-point increase in the F1 score on SQuAD, while reducing computations by around 45%.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"87 ","pages":"Article 101646"},"PeriodicalIF":3.1000,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000299/pdfft?md5=bb08debe9a20bb5be9d04abbaca1b345&pid=1-s2.0-S0885230824000299-main.pdf","citationCount":"0","resultStr":"{\"title\":\"A flexible BERT model enabling width- and depth-dynamic inference\",\"authors\":\"Ting Hu,&nbsp;Christoph Meinel,&nbsp;Haojin Yang\",\"doi\":\"10.1016/j.csl.2024.101646\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Fine-tuning and inference on Large Language Models like BERT have become increasingly expensive regarding memory cost and computation resources. The recently proposed computation-flexible BERT models facilitate their deployment in varied computational environments. Training such flexible BERT models involves jointly optimizing multiple BERT subnets, which will unavoidably interfere with one another. Besides, the performance of large subnets is limited by the performance gap between the smallest subnet and the supernet, despite efforts to enhance the smaller subnets. In this regard, we propose layer-wise Neural grafting to boost BERT subnets, especially the larger ones. The proposed method improves the average performance of the subnets on six GLUE tasks and boosts the supernets on all GLUE tasks and the SQuAD data set. Based on the boosted subnets, we further build an inference framework enabling practical width- and depth-dynamic inference regarding different inputs by combining width-dynamic gating modules and early exit off-ramps in the depth dimension. Experimental results show that the proposed framework achieves a better dynamic inference range than other methods in terms of trading off performance and computational complexity on four GLUE tasks and SQuAD. In particular, our best-tradeoff inference result outperforms other fixed-size models with similar amount of computations. Compared to BERT-Base, the proposed inference framework yields a 1.3-point improvement in the average GLUE score and a 2.2-point increase in the F1 score on SQuAD, while reducing computations by around 45%.</p></div>\",\"PeriodicalId\":50638,\"journal\":{\"name\":\"Computer Speech and Language\",\"volume\":\"87 \",\"pages\":\"Article 101646\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-04-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0885230824000299/pdfft?md5=bb08debe9a20bb5be9d04abbaca1b345&pid=1-s2.0-S0885230824000299-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Speech and Language\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0885230824000299\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230824000299","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

BERT 等大型语言模型的微调和推理在内存成本和计算资源方面变得越来越昂贵。最近提出的计算灵活的 BERT 模型便于在不同的计算环境中部署。训练这种灵活的 BERT 模型需要联合优化多个 BERT 子网,而这些子网不可避免地会相互干扰。此外,大型子网的性能受限于最小子网与超级子网之间的性能差距,尽管我们努力提高小型子网的性能。为此,我们提出了分层神经嫁接法,以提高 BERT 子网,尤其是大型子网的性能。所提出的方法提高了子网在六项 GLUE 任务中的平均性能,并增强了超级网在所有 GLUE 任务和 SQuAD 数据集中的性能。在提升子网的基础上,我们进一步建立了一个推理框架,通过结合宽度动态门控模块和深度维度的早期退出匝道,针对不同的输入进行实用的宽度和深度动态推理。实验结果表明,在四项 GLUE 任务和 SQuAD 上,在性能和计算复杂度的权衡方面,所提出的框架比其他方法实现了更好的动态推理范围。特别是,我们的最佳权衡推理结果优于计算量相似的其他固定大小模型。与 BERT-Base 相比,所提出的推理框架使 GLUE 平均得分提高了 1.3 分,SQuAD 的 F1 得分提高了 2.2 分,同时减少了约 45% 的计算量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A flexible BERT model enabling width- and depth-dynamic inference

Fine-tuning and inference on Large Language Models like BERT have become increasingly expensive regarding memory cost and computation resources. The recently proposed computation-flexible BERT models facilitate their deployment in varied computational environments. Training such flexible BERT models involves jointly optimizing multiple BERT subnets, which will unavoidably interfere with one another. Besides, the performance of large subnets is limited by the performance gap between the smallest subnet and the supernet, despite efforts to enhance the smaller subnets. In this regard, we propose layer-wise Neural grafting to boost BERT subnets, especially the larger ones. The proposed method improves the average performance of the subnets on six GLUE tasks and boosts the supernets on all GLUE tasks and the SQuAD data set. Based on the boosted subnets, we further build an inference framework enabling practical width- and depth-dynamic inference regarding different inputs by combining width-dynamic gating modules and early exit off-ramps in the depth dimension. Experimental results show that the proposed framework achieves a better dynamic inference range than other methods in terms of trading off performance and computational complexity on four GLUE tasks and SQuAD. In particular, our best-tradeoff inference result outperforms other fixed-size models with similar amount of computations. Compared to BERT-Base, the proposed inference framework yields a 1.3-point improvement in the average GLUE score and a 2.2-point increase in the F1 score on SQuAD, while reducing computations by around 45%.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computer Speech and Language
Computer Speech and Language 工程技术-计算机:人工智能
CiteScore
11.30
自引率
4.70%
发文量
80
审稿时长
22.9 weeks
期刊介绍: Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信