使用最小对评估阿拉伯语预训练语言模型中的语言知识

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI:10.18653/v1/2022.wanlp-1.17

Wafa Abdullah Alrajhi, H. Al-Khalifa, Abdulmalik Alsalman

{"title":"使用最小对评估阿拉伯语预训练语言模型中的语言知识","authors":"Wafa Abdullah Alrajhi, H. Al-Khalifa, Abdulmalik Alsalman","doi":"10.18653/v1/2022.wanlp-1.17","DOIUrl":null,"url":null,"abstract":"Despite the noticeable progress that we recently witnessed in Arabic pre-trained language models (PLMs), the linguistic knowledge captured by these models remains unclear. In this paper, we conducted a study to evaluate available Arabic PLMs in terms of their linguistic knowledge. BERT-based language models (LMs) are evaluated using Minimum Pairs (MP), where each pair represents a grammatical sentence and its contradictory counterpart. MPs isolate specific linguistic knowledge to test the model’s sensitivity in understanding a specific linguistic phenomenon. We cover nine major Arabic phenomena: Verbal sentences, Nominal sentences, Adjective Modification, and Idafa construction. The experiments compared the results of fifteen Arabic BERT-based PLMs. Overall, among all tested models, CAMeL-CA outperformed the other PLMs by achieving the highest overall accuracy.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing the Linguistic Knowledge in Arabic Pre-trained Language Models Using Minimal Pairs\",\"authors\":\"Wafa Abdullah Alrajhi, H. Al-Khalifa, Abdulmalik Alsalman\",\"doi\":\"10.18653/v1/2022.wanlp-1.17\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Despite the noticeable progress that we recently witnessed in Arabic pre-trained language models (PLMs), the linguistic knowledge captured by these models remains unclear. In this paper, we conducted a study to evaluate available Arabic PLMs in terms of their linguistic knowledge. BERT-based language models (LMs) are evaluated using Minimum Pairs (MP), where each pair represents a grammatical sentence and its contradictory counterpart. MPs isolate specific linguistic knowledge to test the model’s sensitivity in understanding a specific linguistic phenomenon. We cover nine major Arabic phenomena: Verbal sentences, Nominal sentences, Adjective Modification, and Idafa construction. The experiments compared the results of fifteen Arabic BERT-based PLMs. Overall, among all tested models, CAMeL-CA outperformed the other PLMs by achieving the highest overall accuracy.\",\"PeriodicalId\":355149,\"journal\":{\"name\":\"Workshop on Arabic Natural Language Processing\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Arabic Natural Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2022.wanlp-1.17\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Arabic Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.wanlp-1.17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

尽管我们最近在阿拉伯语预训练语言模型(PLMs)方面取得了显著的进展，但这些模型所捕获的语言知识仍然不清楚。在本文中，我们进行了一项研究，以评估可用的阿拉伯语plm的语言知识。基于bert的语言模型(lm)使用最小对(MP)进行评估，其中每个对代表一个语法句子及其矛盾的对应句。MPs将特定的语言知识分离出来，以测试模型在理解特定语言现象时的敏感性。我们涵盖了九个主要的阿拉伯语现象:动句、名句、形容词修饰和Idafa结构。实验比较了15种基于阿拉伯语bert的plm的结果。总体而言，在所有测试的模型中，CAMeL-CA通过实现最高的整体精度而优于其他plm。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Assessing the Linguistic Knowledge in Arabic Pre-trained Language Models Using Minimal Pairs

Despite the noticeable progress that we recently witnessed in Arabic pre-trained language models (PLMs), the linguistic knowledge captured by these models remains unclear. In this paper, we conducted a study to evaluate available Arabic PLMs in terms of their linguistic knowledge. BERT-based language models (LMs) are evaluated using Minimum Pairs (MP), where each pair represents a grammatical sentence and its contradictory counterpart. MPs isolate specific linguistic knowledge to test the model’s sensitivity in understanding a specific linguistic phenomenon. We cover nine major Arabic phenomena: Verbal sentences, Nominal sentences, Adjective Modification, and Idafa construction. The experiments compared the results of fifteen Arabic BERT-based PLMs. Overall, among all tested models, CAMeL-CA outperformed the other PLMs by achieving the highest overall accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Workshop on Arabic Natural Language Processing

自引率

0.00%

发文量