{"title":"改进多语言混合语言模型","authors":"Mohammed Abd, Elmoneim Al Salamony","doi":"10.21608/djicsi.2024.368659","DOIUrl":null,"url":null,"abstract":"The rapid evolution of social media has facilitated deep insights into user opinions. However, sentiment analysis, particularly for low-resource languages like Arabic, remains underexplored due to limited resources. This study addresses this gap by investigating sentiment analysis on tweet texts from SemEval-17, 2.5+ Million Rows Egyptian Datasets Collection and the Arabic Sentiment Tweet dataset. We evaluated four pretrained language models and introduced two ensemble models. Our results demonstrate that monolingual models showed superior performance, while ensemble models surpassed baseline results, with the majority voting ensemble achieving the best performance, even outperforming English language benchmarks","PeriodicalId":515399,"journal":{"name":"مجلة الدلتا الدولية للعلوم التجارية ونظم المعلومات","volume":"274 9","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hybrid Language Models for Improved Multilingual\",\"authors\":\"Mohammed Abd, Elmoneim Al Salamony\",\"doi\":\"10.21608/djicsi.2024.368659\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The rapid evolution of social media has facilitated deep insights into user opinions. However, sentiment analysis, particularly for low-resource languages like Arabic, remains underexplored due to limited resources. This study addresses this gap by investigating sentiment analysis on tweet texts from SemEval-17, 2.5+ Million Rows Egyptian Datasets Collection and the Arabic Sentiment Tweet dataset. We evaluated four pretrained language models and introduced two ensemble models. Our results demonstrate that monolingual models showed superior performance, while ensemble models surpassed baseline results, with the majority voting ensemble achieving the best performance, even outperforming English language benchmarks\",\"PeriodicalId\":515399,\"journal\":{\"name\":\"مجلة الدلتا الدولية للعلوم التجارية ونظم المعلومات\",\"volume\":\"274 9\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"مجلة الدلتا الدولية للعلوم التجارية ونظم المعلومات\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21608/djicsi.2024.368659\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"مجلة الدلتا الدولية للعلوم التجارية ونظم المعلومات","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21608/djicsi.2024.368659","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The rapid evolution of social media has facilitated deep insights into user opinions. However, sentiment analysis, particularly for low-resource languages like Arabic, remains underexplored due to limited resources. This study addresses this gap by investigating sentiment analysis on tweet texts from SemEval-17, 2.5+ Million Rows Egyptian Datasets Collection and the Arabic Sentiment Tweet dataset. We evaluated four pretrained language models and introduced two ensemble models. Our results demonstrate that monolingual models showed superior performance, while ensemble models surpassed baseline results, with the majority voting ensemble achieving the best performance, even outperforming English language benchmarks