A Study of Compressed Language Models in Social Media Domain

The International FLAIRS Conference Proceedings Pub Date : 2023-05-08 DOI:10.32473/flairs.36.133056

Linrui Zhang, Belinda Copus

{"title":"A Study of Compressed Language Models in Social Media Domain","authors":"Linrui Zhang, Belinda Copus","doi":"10.32473/flairs.36.133056","DOIUrl":null,"url":null,"abstract":"Transfer learning from large-scale language models is witnessing incredible growth and popularity in natural language processing (NLP). However, operating these large models always requires a huge amount of computational power and training effort. Many applications leveraging these large models are not very feasible for industrial products since applying them into power-scarce devices, such as mobile phone, is extremely challenging. In this case, model compression, i.e. transform deep and large networks to shallow and small ones, is becoming a popular research trend in NLP community. Currently, there are many techniques available, such as weight pruning and knowledge distillation. The primary concern regarding these techniques is how much of the language understanding capabilities will be retained by the compressed models in a particular domain? In this paper, we conducted a comparative analyses between several popular large-scale language models, such as BERT, RoBERTa, XLNet-Large and their compressed variants, e.g. Distilled BERT, Distilled RoBERTa and etc, and evaluated their performances on three datasets in the social media domain. Experimental results demonstrate that the compressed language models, though consume less computational resources, are able to achieve approximately the same level of language understanding capabilities as the large-scale language models in the social media domain.","PeriodicalId":302103,"journal":{"name":"The International FLAIRS Conference Proceedings","volume":"9 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The International FLAIRS Conference Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32473/flairs.36.133056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Transfer learning from large-scale language models is witnessing incredible growth and popularity in natural language processing (NLP). However, operating these large models always requires a huge amount of computational power and training effort. Many applications leveraging these large models are not very feasible for industrial products since applying them into power-scarce devices, such as mobile phone, is extremely challenging. In this case, model compression, i.e. transform deep and large networks to shallow and small ones, is becoming a popular research trend in NLP community. Currently, there are many techniques available, such as weight pruning and knowledge distillation. The primary concern regarding these techniques is how much of the language understanding capabilities will be retained by the compressed models in a particular domain? In this paper, we conducted a comparative analyses between several popular large-scale language models, such as BERT, RoBERTa, XLNet-Large and their compressed variants, e.g. Distilled BERT, Distilled RoBERTa and etc, and evaluated their performances on three datasets in the social media domain. Experimental results demonstrate that the compressed language models, though consume less computational resources, are able to achieve approximately the same level of language understanding capabilities as the large-scale language models in the social media domain.

查看原文本刊更多论文

社交媒体领域的压缩语言模型研究

大规模语言模型迁移学习在自然语言处理(NLP)中得到了惊人的发展和普及。然而，操作这些大型模型总是需要大量的计算能力和训练努力。许多利用这些大型模型的应用程序对于工业产品并不十分可行，因为将它们应用到电力稀缺的设备(如移动电话)中是极具挑战性的。在这种情况下，模型压缩，即将深层的大网络转换为浅层的小网络，成为NLP界的一个流行的研究趋势。目前，有许多可用的技术，如权值修剪和知识蒸馏。关于这些技术的主要问题是，在特定领域中，压缩模型将保留多少语言理解能力?在本文中，我们对几种流行的大规模语言模型(BERT, RoBERTa, XLNet-Large)及其压缩变体(如Distilled BERT, Distilled RoBERTa等)进行了比较分析，并评估了它们在社交媒体领域的三个数据集上的性能。实验结果表明，压缩语言模型虽然消耗较少的计算资源，但在社交媒体领域能够达到与大规模语言模型大致相同的语言理解能力水平。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The International FLAIRS Conference Proceedings

自引率

0.00%

发文量