Hamed Ramadan, Mohammad M. Alqahtani, Abdullah Algoson
{"title":"Identifying Equivalent Words from Different Arabic Dialects Using Deep Learning Techniques","authors":"Hamed Ramadan, Mohammad M. Alqahtani, Abdullah Algoson","doi":"10.1109/ESOLEC54569.2022.10009555","DOIUrl":null,"url":null,"abstract":"The Arabic language comprises many spoken dialects. These dialects vary from a standard written Modern Standard Arabic (MSA) in terms of syntactic, lexical, phonological, and morphological. Arabic Dialects differ, not only along a geographical continuum, but also with other sociolinguistic factors such as the urban, rural, Bedouin dimension. Currently, Dialectal Arabic (DA) is the essential written language of unofficial communication in the Arab World. These Dialects can be found on social media platforms, emails, Twitter, etc. There has been a high interest in research on computational models of Arabic dialects in the last decade. Most of these studies focus on Arabic dialect identification (classification) and building Arabic dialect corpora. However, finding Arabic dialect word synonyms from another Arabic dialects has received limited attention. To bridge this gap, this study will develop a model to identify the equivalent words from different Arab world dialects using deep learning techniques such as word2vec. This research merged and extended the existing Arabic dialects corpora and then applied some deep learning techniques to achieve the best results for dialectal word synonyms. The outcomes of this research are a new dataset of Arabic dialectical word synonyms and a model with acceptable accuracy of 81%.","PeriodicalId":179850,"journal":{"name":"2022 20th International Conference on Language Engineering (ESOLEC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 20th International Conference on Language Engineering (ESOLEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESOLEC54569.2022.10009555","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The Arabic language comprises many spoken dialects. These dialects vary from a standard written Modern Standard Arabic (MSA) in terms of syntactic, lexical, phonological, and morphological. Arabic Dialects differ, not only along a geographical continuum, but also with other sociolinguistic factors such as the urban, rural, Bedouin dimension. Currently, Dialectal Arabic (DA) is the essential written language of unofficial communication in the Arab World. These Dialects can be found on social media platforms, emails, Twitter, etc. There has been a high interest in research on computational models of Arabic dialects in the last decade. Most of these studies focus on Arabic dialect identification (classification) and building Arabic dialect corpora. However, finding Arabic dialect word synonyms from another Arabic dialects has received limited attention. To bridge this gap, this study will develop a model to identify the equivalent words from different Arab world dialects using deep learning techniques such as word2vec. This research merged and extended the existing Arabic dialects corpora and then applied some deep learning techniques to achieve the best results for dialectal word synonyms. The outcomes of this research are a new dataset of Arabic dialectical word synonyms and a model with acceptable accuracy of 81%.