A Study of Compressed Language Models in Social Media Domain

Linrui Zhang, Belinda Copus
{"title":"A Study of Compressed Language Models in Social Media Domain","authors":"Linrui Zhang, Belinda Copus","doi":"10.32473/flairs.36.133056","DOIUrl":null,"url":null,"abstract":"Transfer learning from large-scale language models is witnessing incredible growth and popularity in natural language processing (NLP). However, operating these large models always requires a huge amount of computational power and training effort. Many applications leveraging these large models are not very feasible for industrial products since applying them into power-scarce devices, such as mobile phone, is extremely challenging. In this case, model compression, i.e. transform deep and large networks to shallow and small ones, is becoming a popular research trend in NLP community. Currently, there are many techniques available, such as weight pruning and knowledge distillation. The primary concern regarding these techniques is how much of the language understanding capabilities will be retained by the compressed models in a particular domain? In this paper, we conducted a comparative analyses between several popular large-scale language models, such as BERT, RoBERTa, XLNet-Large and their compressed variants, e.g. Distilled BERT, Distilled RoBERTa and etc, and evaluated their performances on three datasets in the social media domain. Experimental results demonstrate that the compressed language models, though consume less computational resources, are able to achieve approximately the same level of language understanding capabilities as the large-scale language models in the social media domain.","PeriodicalId":302103,"journal":{"name":"The International FLAIRS Conference Proceedings","volume":"9 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The International FLAIRS Conference Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32473/flairs.36.133056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Transfer learning from large-scale language models is witnessing incredible growth and popularity in natural language processing (NLP). However, operating these large models always requires a huge amount of computational power and training effort. Many applications leveraging these large models are not very feasible for industrial products since applying them into power-scarce devices, such as mobile phone, is extremely challenging. In this case, model compression, i.e. transform deep and large networks to shallow and small ones, is becoming a popular research trend in NLP community. Currently, there are many techniques available, such as weight pruning and knowledge distillation. The primary concern regarding these techniques is how much of the language understanding capabilities will be retained by the compressed models in a particular domain? In this paper, we conducted a comparative analyses between several popular large-scale language models, such as BERT, RoBERTa, XLNet-Large and their compressed variants, e.g. Distilled BERT, Distilled RoBERTa and etc, and evaluated their performances on three datasets in the social media domain. Experimental results demonstrate that the compressed language models, though consume less computational resources, are able to achieve approximately the same level of language understanding capabilities as the large-scale language models in the social media domain.
社交媒体领域的压缩语言模型研究
大规模语言模型迁移学习在自然语言处理(NLP)中得到了惊人的发展和普及。然而,操作这些大型模型总是需要大量的计算能力和训练努力。许多利用这些大型模型的应用程序对于工业产品并不十分可行,因为将它们应用到电力稀缺的设备(如移动电话)中是极具挑战性的。在这种情况下,模型压缩,即将深层的大网络转换为浅层的小网络,成为NLP界的一个流行的研究趋势。目前,有许多可用的技术,如权值修剪和知识蒸馏。关于这些技术的主要问题是,在特定领域中,压缩模型将保留多少语言理解能力?在本文中,我们对几种流行的大规模语言模型(BERT, RoBERTa, XLNet-Large)及其压缩变体(如Distilled BERT, Distilled RoBERTa等)进行了比较分析,并评估了它们在社交媒体领域的三个数据集上的性能。实验结果表明,压缩语言模型虽然消耗较少的计算资源,但在社交媒体领域能够达到与大规模语言模型大致相同的语言理解能力水平。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信