Assessing the Linguistic Knowledge in Arabic Pre-trained Language Models Using Minimal Pairs

Workshop on Arabic Natural Language Processing Pub Date : 1900-01-01 DOI:10.18653/v1/2022.wanlp-1.17

Wafa Abdullah Alrajhi, H. Al-Khalifa, Abdulmalik Alsalman

引用次数: 0

Abstract

Despite the noticeable progress that we recently witnessed in Arabic pre-trained language models (PLMs), the linguistic knowledge captured by these models remains unclear. In this paper, we conducted a study to evaluate available Arabic PLMs in terms of their linguistic knowledge. BERT-based language models (LMs) are evaluated using Minimum Pairs (MP), where each pair represents a grammatical sentence and its contradictory counterpart. MPs isolate specific linguistic knowledge to test the model’s sensitivity in understanding a specific linguistic phenomenon. We cover nine major Arabic phenomena: Verbal sentences, Nominal sentences, Adjective Modification, and Idafa construction. The experiments compared the results of fifteen Arabic BERT-based PLMs. Overall, among all tested models, CAMeL-CA outperformed the other PLMs by achieving the highest overall accuracy.

查看原文本刊更多论文

使用最小对评估阿拉伯语预训练语言模型中的语言知识

尽管我们最近在阿拉伯语预训练语言模型(PLMs)方面取得了显著的进展，但这些模型所捕获的语言知识仍然不清楚。在本文中，我们进行了一项研究，以评估可用的阿拉伯语plm的语言知识。基于bert的语言模型(lm)使用最小对(MP)进行评估，其中每个对代表一个语法句子及其矛盾的对应句。MPs将特定的语言知识分离出来，以测试模型在理解特定语言现象时的敏感性。我们涵盖了九个主要的阿拉伯语现象:动句、名句、形容词修饰和Idafa结构。实验比较了15种基于阿拉伯语bert的plm的结果。总体而言，在所有测试的模型中，CAMeL-CA通过实现最高的整体精度而优于其他plm。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Workshop on Arabic Natural Language Processing

自引率

0.00%

发文量