Advancing arabic dialect detection with hybrid stacked transformer models.

IF 2.4 3区 医学 Q3 NEUROSCIENCES
Frontiers in Human Neuroscience Pub Date : 2025-02-11 eCollection Date: 2025-01-01 DOI:10.3389/fnhum.2025.1498297
Hager Saleh, Abdulaziz AlMohimeed, Rasha Hassan, Mandour M Ibrahim, Saeed Hamood Alsamhi, Moatamad Refaat Hassan, Sherif Mostafa
{"title":"Advancing arabic dialect detection with hybrid stacked transformer models.","authors":"Hager Saleh, Abdulaziz AlMohimeed, Rasha Hassan, Mandour M Ibrahim, Saeed Hamood Alsamhi, Moatamad Refaat Hassan, Sherif Mostafa","doi":"10.3389/fnhum.2025.1498297","DOIUrl":null,"url":null,"abstract":"<p><p>The rapid expansion of dialectally unique Arabic material on social media and the internet highlights how important it is to categorize dialects accurately to maximize a variety of Natural Language Processing (NLP) applications. The improvement in classification performance highlights the wider variety of linguistic variables that the model can capture, providing a reliable solution for precise Arabic dialect recognition and improving the efficacy of NLP applications. Recent advances in deep learning (DL) models have shown promise in overcoming potential challenges in identifying Arabic dialects. In this paper, we propose a novel stacking model based on two transformer models, i.e., Bert-Base-Arabertv02 and Dialectal-Arabic-XLM-R-Base, to enhance the classification of dialectal Arabic. The proposed model consists of two levels, including base models and meta-learners. In the proposed model, Level 1 generates class probabilities from two transformer models for training and testing sets, which are then used in Level 2 to train and evaluate a meta-learner. The stacking model compares various models, including long-short-term memory (LSTM), gated recurrent units (GRU), convolutional neural network (CNN), and two transformer models using different word embedding. The results show that the stacking model combination of two models archives outperformance over single-model approaches due to capturing a broader range of linguistic features, which leads to better generalization across different forms of Arabic. The proposed model is evaluated based on the performance of IADD and Shami. For Shami, the Stacking-Transformer achieves the highest performance in all rates compared to other models with 89.73 accuracy, 89.596 precision, 89.73 recall, and 89.574 F1-score. For IADD, the Stacking-Transformer achieves the highest performance in all rates compared to other models with 93.062 accuracy, 93.368 precision, 93.062 recall, and 93.184 F1 score. The improvement in classification performance highlights the wider variety of linguistic variables that the model can capture, providing a reliable solution for precise Arabic dialect recognition and improving the efficacy of NLP applications.</p>","PeriodicalId":12536,"journal":{"name":"Frontiers in Human Neuroscience","volume":"19 ","pages":"1498297"},"PeriodicalIF":2.4000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11850318/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Human Neuroscience","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fnhum.2025.1498297","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"NEUROSCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

The rapid expansion of dialectally unique Arabic material on social media and the internet highlights how important it is to categorize dialects accurately to maximize a variety of Natural Language Processing (NLP) applications. The improvement in classification performance highlights the wider variety of linguistic variables that the model can capture, providing a reliable solution for precise Arabic dialect recognition and improving the efficacy of NLP applications. Recent advances in deep learning (DL) models have shown promise in overcoming potential challenges in identifying Arabic dialects. In this paper, we propose a novel stacking model based on two transformer models, i.e., Bert-Base-Arabertv02 and Dialectal-Arabic-XLM-R-Base, to enhance the classification of dialectal Arabic. The proposed model consists of two levels, including base models and meta-learners. In the proposed model, Level 1 generates class probabilities from two transformer models for training and testing sets, which are then used in Level 2 to train and evaluate a meta-learner. The stacking model compares various models, including long-short-term memory (LSTM), gated recurrent units (GRU), convolutional neural network (CNN), and two transformer models using different word embedding. The results show that the stacking model combination of two models archives outperformance over single-model approaches due to capturing a broader range of linguistic features, which leads to better generalization across different forms of Arabic. The proposed model is evaluated based on the performance of IADD and Shami. For Shami, the Stacking-Transformer achieves the highest performance in all rates compared to other models with 89.73 accuracy, 89.596 precision, 89.73 recall, and 89.574 F1-score. For IADD, the Stacking-Transformer achieves the highest performance in all rates compared to other models with 93.062 accuracy, 93.368 precision, 93.062 recall, and 93.184 F1 score. The improvement in classification performance highlights the wider variety of linguistic variables that the model can capture, providing a reliable solution for precise Arabic dialect recognition and improving the efficacy of NLP applications.

求助全文
约1分钟内获得全文 求助全文
来源期刊
Frontiers in Human Neuroscience
Frontiers in Human Neuroscience 医学-神经科学
CiteScore
4.70
自引率
6.90%
发文量
830
审稿时长
2-4 weeks
期刊介绍: Frontiers in Human Neuroscience is a first-tier electronic journal devoted to understanding the brain mechanisms supporting cognitive and social behavior in humans, and how these mechanisms might be altered in disease states. The last 25 years have seen an explosive growth in both the methods and the theoretical constructs available to study the human brain. Advances in electrophysiological, neuroimaging, neuropsychological, psychophysical, neuropharmacological and computational approaches have provided key insights into the mechanisms of a broad range of human behaviors in both health and disease. Work in human neuroscience ranges from the cognitive domain, including areas such as memory, attention, language and perception to the social domain, with this last subject addressing topics, such as interpersonal interactions, social discourse and emotional regulation. How these processes unfold during development, mature in adulthood and often decline in aging, and how they are altered in a host of developmental, neurological and psychiatric disorders, has become increasingly amenable to human neuroscience research approaches. Work in human neuroscience has influenced many areas of inquiry ranging from social and cognitive psychology to economics, law and public policy. Accordingly, our journal will provide a forum for human research spanning all areas of human cognitive, social, developmental and translational neuroscience using any research approach.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信