Assessing the Linguistic Knowledge in Arabic Pre-trained Language Models Using Minimal Pairs

Wafa Abdullah Alrajhi, H. Al-Khalifa, Abdulmalik Alsalman
{"title":"Assessing the Linguistic Knowledge in Arabic Pre-trained Language Models Using Minimal Pairs","authors":"Wafa Abdullah Alrajhi, H. Al-Khalifa, Abdulmalik Alsalman","doi":"10.18653/v1/2022.wanlp-1.17","DOIUrl":null,"url":null,"abstract":"Despite the noticeable progress that we recently witnessed in Arabic pre-trained language models (PLMs), the linguistic knowledge captured by these models remains unclear. In this paper, we conducted a study to evaluate available Arabic PLMs in terms of their linguistic knowledge. BERT-based language models (LMs) are evaluated using Minimum Pairs (MP), where each pair represents a grammatical sentence and its contradictory counterpart. MPs isolate specific linguistic knowledge to test the model’s sensitivity in understanding a specific linguistic phenomenon. We cover nine major Arabic phenomena: Verbal sentences, Nominal sentences, Adjective Modification, and Idafa construction. The experiments compared the results of fifteen Arabic BERT-based PLMs. Overall, among all tested models, CAMeL-CA outperformed the other PLMs by achieving the highest overall accuracy.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Arabic Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.wanlp-1.17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Despite the noticeable progress that we recently witnessed in Arabic pre-trained language models (PLMs), the linguistic knowledge captured by these models remains unclear. In this paper, we conducted a study to evaluate available Arabic PLMs in terms of their linguistic knowledge. BERT-based language models (LMs) are evaluated using Minimum Pairs (MP), where each pair represents a grammatical sentence and its contradictory counterpart. MPs isolate specific linguistic knowledge to test the model’s sensitivity in understanding a specific linguistic phenomenon. We cover nine major Arabic phenomena: Verbal sentences, Nominal sentences, Adjective Modification, and Idafa construction. The experiments compared the results of fifteen Arabic BERT-based PLMs. Overall, among all tested models, CAMeL-CA outperformed the other PLMs by achieving the highest overall accuracy.
使用最小对评估阿拉伯语预训练语言模型中的语言知识
尽管我们最近在阿拉伯语预训练语言模型(PLMs)方面取得了显著的进展,但这些模型所捕获的语言知识仍然不清楚。在本文中,我们进行了一项研究,以评估可用的阿拉伯语plm的语言知识。基于bert的语言模型(lm)使用最小对(MP)进行评估,其中每个对代表一个语法句子及其矛盾的对应句。MPs将特定的语言知识分离出来,以测试模型在理解特定语言现象时的敏感性。我们涵盖了九个主要的阿拉伯语现象:动句、名句、形容词修饰和Idafa结构。实验比较了15种基于阿拉伯语bert的plm的结果。总体而言,在所有测试的模型中,CAMeL-CA通过实现最高的整体精度而优于其他plm。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信