ElitePLM:预训练语言模型通用语言能力评价的实证研究

North American Chapter of the Association for Computational Linguistics Pub Date : 2022-05-03 DOI:10.48550/arXiv.2205.01523

Junyi Li, Tianyi Tang, Zheng Gong, Lixin Yang, Zhuohao Yu, Z. Chen, Jingyuan Wang, Wayne Xin Zhao, Ji-rong Wen

{"title":"ElitePLM:预训练语言模型通用语言能力评价的实证研究","authors":"Junyi Li, Tianyi Tang, Zheng Gong, Lixin Yang, Zhuohao Yu, Z. Chen, Jingyuan Wang, Wayne Xin Zhao, Ji-rong Wen","doi":"10.48550/arXiv.2205.01523","DOIUrl":null,"url":null,"abstract":"Nowadays, pretrained language models (PLMs) have dominated the majority of NLP tasks. While, little research has been conducted on systematically evaluating the language abilities of PLMs. In this paper, we present a large-scale empirical study on general language ability evaluation of PLMs (ElitePLM). In our study, we design four evaluation dimensions, memory, comprehension, reasoning, and composition, to measure ten widely-used PLMs within five categories. Our empirical results demonstrate that: (1) PLMs with varying training objectives and strategies are good at different ability tests; (2) fine-tuning PLMs in downstream tasks is usually sensitive to the data size and distribution; (3) PLMs have excellent transferability between similar tasks. Moreover, the prediction results of PLMs in our experiments are released as an open resource for more deep and detailed analysis on the language abilities of PLMs. This paper can guide the future work to select, apply, and design PLMs for specific tasks. We have made all the details of experiments publicly available at https://github.com/RUCAIBox/ElitePLM.","PeriodicalId":382084,"journal":{"name":"North American Chapter of the Association for Computational Linguistics","volume":"2020 18","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"ElitePLM: An Empirical Study on General Language Ability Evaluation of Pretrained Language Models\",\"authors\":\"Junyi Li, Tianyi Tang, Zheng Gong, Lixin Yang, Zhuohao Yu, Z. Chen, Jingyuan Wang, Wayne Xin Zhao, Ji-rong Wen\",\"doi\":\"10.48550/arXiv.2205.01523\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, pretrained language models (PLMs) have dominated the majority of NLP tasks. While, little research has been conducted on systematically evaluating the language abilities of PLMs. In this paper, we present a large-scale empirical study on general language ability evaluation of PLMs (ElitePLM). In our study, we design four evaluation dimensions, memory, comprehension, reasoning, and composition, to measure ten widely-used PLMs within five categories. Our empirical results demonstrate that: (1) PLMs with varying training objectives and strategies are good at different ability tests; (2) fine-tuning PLMs in downstream tasks is usually sensitive to the data size and distribution; (3) PLMs have excellent transferability between similar tasks. Moreover, the prediction results of PLMs in our experiments are released as an open resource for more deep and detailed analysis on the language abilities of PLMs. This paper can guide the future work to select, apply, and design PLMs for specific tasks. We have made all the details of experiments publicly available at https://github.com/RUCAIBox/ElitePLM.\",\"PeriodicalId\":382084,\"journal\":{\"name\":\"North American Chapter of the Association for Computational Linguistics\",\"volume\":\"2020 18\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"North American Chapter of the Association for Computational Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2205.01523\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"North American Chapter of the Association for Computational Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2205.01523","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

目前，预训练语言模型(PLMs)已经主导了大多数NLP任务。然而，关于系统评估plm语言能力的研究却很少。在本文中，我们提出了一个大规模的实证研究的一般语言能力评估的plm (ElitePLM)。在我们的研究中，我们设计了四个评估维度:记忆、理解、推理和构成，以衡量五个类别中十个广泛使用的plm。我们的实证结果表明:(1)不同培训目标和策略的PLMs在不同的能力测试中表现优异;(2)下游任务的PLMs微调通常对数据大小和分布敏感;(3) plm在相似任务之间具有良好的可转移性。此外，我们将实验中plm的预测结果作为开放资源发布，以便更深入、更详细地分析plm的语言能力。本文可以指导今后的工作，为特定的任务选择、应用和设计plm。我们已经在https://github.com/RUCAIBox/ElitePLM上公开了实验的所有细节。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ElitePLM: An Empirical Study on General Language Ability Evaluation of Pretrained Language Models

Nowadays, pretrained language models (PLMs) have dominated the majority of NLP tasks. While, little research has been conducted on systematically evaluating the language abilities of PLMs. In this paper, we present a large-scale empirical study on general language ability evaluation of PLMs (ElitePLM). In our study, we design four evaluation dimensions, memory, comprehension, reasoning, and composition, to measure ten widely-used PLMs within five categories. Our empirical results demonstrate that: (1) PLMs with varying training objectives and strategies are good at different ability tests; (2) fine-tuning PLMs in downstream tasks is usually sensitive to the data size and distribution; (3) PLMs have excellent transferability between similar tasks. Moreover, the prediction results of PLMs in our experiments are released as an open resource for more deep and detailed analysis on the language abilities of PLMs. This paper can guide the future work to select, apply, and design PLMs for specific tasks. We have made all the details of experiments publicly available at https://github.com/RUCAIBox/ElitePLM.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

North American Chapter of the Association for Computational Linguistics

自引率

0.00%

发文量