Emran Al-Bashabsheh, Huthaifa Al-Khazaleh, Omar N. Elayan, R. Duwairi
{"title":"Commonsense Validation for Arabic Sentences using Deep Learning","authors":"Emran Al-Bashabsheh, Huthaifa Al-Khazaleh, Omar N. Elayan, R. Duwairi","doi":"10.1109/acit53391.2021.9677156","DOIUrl":null,"url":null,"abstract":"Arabic language processing is one of the largest challenges faced by researchers due to the complex nature of the Arabic language traits. With the recent advancements in deep learning applications and the vast amounts of data available, it is possible to secure human-like performance in several tasks. One of these tasks is distinguishing the common-sense in sentences in different languages in general, and in Arabic in specific. The Arabic commonsense validation is still a growing area in tackling the Arabic language understanding domain. In this paper, a set of multilingual pre-trained transformer models were employed. These models were trained and tested using a dataset of the commonsense understanding in the Arabic field. The dataset contains 12 thousand pairs of sentences. The two sentences, in every pair, are very similar in syntax; however, one of these sentences is logical (makes sense) and the other is illogical. BERT, XLM-MLM and XLM-Roberta models were fine-tuned to discover the logical sentences in the sentence pairs. The best result was achieved using XLM-Roberta with an accuracy equals to 81.2%.","PeriodicalId":302120,"journal":{"name":"2021 22nd International Arab Conference on Information Technology (ACIT)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 22nd International Arab Conference on Information Technology (ACIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/acit53391.2021.9677156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Arabic language processing is one of the largest challenges faced by researchers due to the complex nature of the Arabic language traits. With the recent advancements in deep learning applications and the vast amounts of data available, it is possible to secure human-like performance in several tasks. One of these tasks is distinguishing the common-sense in sentences in different languages in general, and in Arabic in specific. The Arabic commonsense validation is still a growing area in tackling the Arabic language understanding domain. In this paper, a set of multilingual pre-trained transformer models were employed. These models were trained and tested using a dataset of the commonsense understanding in the Arabic field. The dataset contains 12 thousand pairs of sentences. The two sentences, in every pair, are very similar in syntax; however, one of these sentences is logical (makes sense) and the other is illogical. BERT, XLM-MLM and XLM-Roberta models were fine-tuned to discover the logical sentences in the sentence pairs. The best result was achieved using XLM-Roberta with an accuracy equals to 81.2%.