SudaBERT: A Pre-trained Encoder Representation For Sudanese Arabic Dialect

2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE) Pub Date : 2021-02-26 DOI:10.1109/ICCCEEE49695.2021.9429651

Mukhtar Elgezouli, Khalid N. Elmadani, Muhammed Saeed

引用次数: 3

Abstract

Bidirectional Encoder Representations from Transformers (BERT) has proven to be very efficient at Natural Language Understanding (NLU), as it allows to achieve state-of-the-art results in most NLU tasks. In this work we aim to utilize the power of BERT in Sudanese Arabic dialect, and produce a Sudanese word representation. We collected over 7 million sentences in Sudanese dialect and used them to resume training of the pre-trained Arabic-BERT, as it was trained on large Modern Standard Arabic (MSA) corpus. Our model -SudaBERT- has achieved better performance on Sudanese Sentiment Analysis, this clarifies that SudaBERT works better in understanding Sudanese Dialectic which is the domain we are interested in.

查看原文本刊更多论文

苏丹阿拉伯语方言的预训练编码器表示

来自变形器的双向编码器表示(BERT)已被证明在自然语言理解(NLU)中非常有效，因为它允许在大多数NLU任务中获得最先进的结果。在这项工作中，我们的目标是利用BERT在苏丹阿拉伯语方言中的力量，并产生苏丹语单词表示。我们收集了超过700万个苏丹方言句子，并使用它们来恢复预训练的阿拉伯语bert的训练，因为它是在大型现代标准阿拉伯语(MSA)语料库上训练的。我们的模型-SudaBERT-在苏丹情感分析上取得了更好的表现，这表明SudaBERT在理解苏丹辩证法方面工作得更好，这是我们感兴趣的领域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)

自引率

0.00%

发文量