Mukhtar Elgezouli, Khalid N. Elmadani, Muhammed Saeed
{"title":"苏丹阿拉伯语方言的预训练编码器表示","authors":"Mukhtar Elgezouli, Khalid N. Elmadani, Muhammed Saeed","doi":"10.1109/ICCCEEE49695.2021.9429651","DOIUrl":null,"url":null,"abstract":"Bidirectional Encoder Representations from Transformers (BERT) has proven to be very efficient at Natural Language Understanding (NLU), as it allows to achieve state-of-the-art results in most NLU tasks. In this work we aim to utilize the power of BERT in Sudanese Arabic dialect, and produce a Sudanese word representation. We collected over 7 million sentences in Sudanese dialect and used them to resume training of the pre-trained Arabic-BERT, as it was trained on large Modern Standard Arabic (MSA) corpus. Our model -SudaBERT- has achieved better performance on Sudanese Sentiment Analysis, this clarifies that SudaBERT works better in understanding Sudanese Dialectic which is the domain we are interested in.","PeriodicalId":359802,"journal":{"name":"2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"SudaBERT: A Pre-trained Encoder Representation For Sudanese Arabic Dialect\",\"authors\":\"Mukhtar Elgezouli, Khalid N. Elmadani, Muhammed Saeed\",\"doi\":\"10.1109/ICCCEEE49695.2021.9429651\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Bidirectional Encoder Representations from Transformers (BERT) has proven to be very efficient at Natural Language Understanding (NLU), as it allows to achieve state-of-the-art results in most NLU tasks. In this work we aim to utilize the power of BERT in Sudanese Arabic dialect, and produce a Sudanese word representation. We collected over 7 million sentences in Sudanese dialect and used them to resume training of the pre-trained Arabic-BERT, as it was trained on large Modern Standard Arabic (MSA) corpus. Our model -SudaBERT- has achieved better performance on Sudanese Sentiment Analysis, this clarifies that SudaBERT works better in understanding Sudanese Dialectic which is the domain we are interested in.\",\"PeriodicalId\":359802,\"journal\":{\"name\":\"2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCEEE49695.2021.9429651\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCEEE49695.2021.9429651","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SudaBERT: A Pre-trained Encoder Representation For Sudanese Arabic Dialect
Bidirectional Encoder Representations from Transformers (BERT) has proven to be very efficient at Natural Language Understanding (NLU), as it allows to achieve state-of-the-art results in most NLU tasks. In this work we aim to utilize the power of BERT in Sudanese Arabic dialect, and produce a Sudanese word representation. We collected over 7 million sentences in Sudanese dialect and used them to resume training of the pre-trained Arabic-BERT, as it was trained on large Modern Standard Arabic (MSA) corpus. Our model -SudaBERT- has achieved better performance on Sudanese Sentiment Analysis, this clarifies that SudaBERT works better in understanding Sudanese Dialectic which is the domain we are interested in.