Commonsense Validation for Arabic Sentences using Deep Learning

Emran Al-Bashabsheh, Huthaifa Al-Khazaleh, Omar N. Elayan, R. Duwairi
{"title":"Commonsense Validation for Arabic Sentences using Deep Learning","authors":"Emran Al-Bashabsheh, Huthaifa Al-Khazaleh, Omar N. Elayan, R. Duwairi","doi":"10.1109/acit53391.2021.9677156","DOIUrl":null,"url":null,"abstract":"Arabic language processing is one of the largest challenges faced by researchers due to the complex nature of the Arabic language traits. With the recent advancements in deep learning applications and the vast amounts of data available, it is possible to secure human-like performance in several tasks. One of these tasks is distinguishing the common-sense in sentences in different languages in general, and in Arabic in specific. The Arabic commonsense validation is still a growing area in tackling the Arabic language understanding domain. In this paper, a set of multilingual pre-trained transformer models were employed. These models were trained and tested using a dataset of the commonsense understanding in the Arabic field. The dataset contains 12 thousand pairs of sentences. The two sentences, in every pair, are very similar in syntax; however, one of these sentences is logical (makes sense) and the other is illogical. BERT, XLM-MLM and XLM-Roberta models were fine-tuned to discover the logical sentences in the sentence pairs. The best result was achieved using XLM-Roberta with an accuracy equals to 81.2%.","PeriodicalId":302120,"journal":{"name":"2021 22nd International Arab Conference on Information Technology (ACIT)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 22nd International Arab Conference on Information Technology (ACIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/acit53391.2021.9677156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Arabic language processing is one of the largest challenges faced by researchers due to the complex nature of the Arabic language traits. With the recent advancements in deep learning applications and the vast amounts of data available, it is possible to secure human-like performance in several tasks. One of these tasks is distinguishing the common-sense in sentences in different languages in general, and in Arabic in specific. The Arabic commonsense validation is still a growing area in tackling the Arabic language understanding domain. In this paper, a set of multilingual pre-trained transformer models were employed. These models were trained and tested using a dataset of the commonsense understanding in the Arabic field. The dataset contains 12 thousand pairs of sentences. The two sentences, in every pair, are very similar in syntax; however, one of these sentences is logical (makes sense) and the other is illogical. BERT, XLM-MLM and XLM-Roberta models were fine-tuned to discover the logical sentences in the sentence pairs. The best result was achieved using XLM-Roberta with an accuracy equals to 81.2%.
使用深度学习的阿拉伯语句子常识验证
由于阿拉伯语特征的复杂性,阿拉伯语处理是研究人员面临的最大挑战之一。随着深度学习应用程序的最新进展和大量可用数据,有可能在几个任务中获得类似人类的表现。其中一项任务是区分不同语言句子中的常识,特别是阿拉伯语中的常识。在处理阿拉伯语理解领域,阿拉伯语常识验证仍然是一个不断发展的领域。本文采用了一组多语言预训练的变压器模型。这些模型使用阿拉伯语领域的常识理解数据集进行训练和测试。该数据集包含1.2万对句子。这两个句子,在每一对中,在句法上都非常相似;然而,其中一个句子是合乎逻辑的(有意义),另一个是不合逻辑的。对BERT、XLM-MLM和XLM-Roberta模型进行了微调,以发现句子对中的逻辑句子。使用XLM-Roberta获得了最好的结果,准确率为81.2%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信