Commonsense Validation for Arabic Sentences using Deep Learning

2021 22nd International Arab Conference on Information Technology (ACIT) Pub Date : 2021-12-21 DOI:10.1109/acit53391.2021.9677156

Emran Al-Bashabsheh, Huthaifa Al-Khazaleh, Omar N. Elayan, R. Duwairi

{"title":"Commonsense Validation for Arabic Sentences using Deep Learning","authors":"Emran Al-Bashabsheh, Huthaifa Al-Khazaleh, Omar N. Elayan, R. Duwairi","doi":"10.1109/acit53391.2021.9677156","DOIUrl":null,"url":null,"abstract":"Arabic language processing is one of the largest challenges faced by researchers due to the complex nature of the Arabic language traits. With the recent advancements in deep learning applications and the vast amounts of data available, it is possible to secure human-like performance in several tasks. One of these tasks is distinguishing the common-sense in sentences in different languages in general, and in Arabic in specific. The Arabic commonsense validation is still a growing area in tackling the Arabic language understanding domain. In this paper, a set of multilingual pre-trained transformer models were employed. These models were trained and tested using a dataset of the commonsense understanding in the Arabic field. The dataset contains 12 thousand pairs of sentences. The two sentences, in every pair, are very similar in syntax; however, one of these sentences is logical (makes sense) and the other is illogical. BERT, XLM-MLM and XLM-Roberta models were fine-tuned to discover the logical sentences in the sentence pairs. The best result was achieved using XLM-Roberta with an accuracy equals to 81.2%.","PeriodicalId":302120,"journal":{"name":"2021 22nd International Arab Conference on Information Technology (ACIT)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 22nd International Arab Conference on Information Technology (ACIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/acit53391.2021.9677156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Arabic language processing is one of the largest challenges faced by researchers due to the complex nature of the Arabic language traits. With the recent advancements in deep learning applications and the vast amounts of data available, it is possible to secure human-like performance in several tasks. One of these tasks is distinguishing the common-sense in sentences in different languages in general, and in Arabic in specific. The Arabic commonsense validation is still a growing area in tackling the Arabic language understanding domain. In this paper, a set of multilingual pre-trained transformer models were employed. These models were trained and tested using a dataset of the commonsense understanding in the Arabic field. The dataset contains 12 thousand pairs of sentences. The two sentences, in every pair, are very similar in syntax; however, one of these sentences is logical (makes sense) and the other is illogical. BERT, XLM-MLM and XLM-Roberta models were fine-tuned to discover the logical sentences in the sentence pairs. The best result was achieved using XLM-Roberta with an accuracy equals to 81.2%.

查看原文本刊更多论文

使用深度学习的阿拉伯语句子常识验证

由于阿拉伯语特征的复杂性，阿拉伯语处理是研究人员面临的最大挑战之一。随着深度学习应用程序的最新进展和大量可用数据，有可能在几个任务中获得类似人类的表现。其中一项任务是区分不同语言句子中的常识，特别是阿拉伯语中的常识。在处理阿拉伯语理解领域，阿拉伯语常识验证仍然是一个不断发展的领域。本文采用了一组多语言预训练的变压器模型。这些模型使用阿拉伯语领域的常识理解数据集进行训练和测试。该数据集包含1.2万对句子。这两个句子，在每一对中，在句法上都非常相似;然而，其中一个句子是合乎逻辑的(有意义)，另一个是不合逻辑的。对BERT、XLM-MLM和XLM-Roberta模型进行了微调，以发现句子对中的逻辑句子。使用XLM-Roberta获得了最好的结果，准确率为81.2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 22nd International Arab Conference on Information Technology (ACIT)

自引率

0.00%

发文量