A Weak Supervised Transfer Learning Approach for Sentiment Analysis to the Kuwaiti Dialect

Fatemah Husain, Hana Al-Ostad, Halima Omar
{"title":"A Weak Supervised Transfer Learning Approach for Sentiment Analysis to the Kuwaiti Dialect","authors":"Fatemah Husain, Hana Al-Ostad, Halima Omar","doi":"10.18653/v1/2022.wanlp-1.15","DOIUrl":null,"url":null,"abstract":"Developing a system for sentiment analysis is very challenging for the Arabic language due to the limitations in the available Arabic datasets. Many Arabic dialects are still not studied by researchers in Arabic sentiment analysis due to the complexity of annotators’ recruitment process during dataset creation. This paper covers the research gap in sentiment analysis for the Kuwaiti dialect by proposing a weak supervised approach to develop a large labeled dataset. Our dataset consists of over 16.6k tweets with 7,905 negatives, 7,902 positives, and 860 neutrals that spans several themes and time frames to remove any bias that might affect its content. The annotation agreement between our proposed system’s labels and human-annotated labels reports 93% for the pairwise percent agreement and 0.87 for Cohen’s kappa coefficient. Furthermore, we evaluate our dataset using multiple traditional machine learning classifiers and advanced deep learning language models to test its performance. The results report 89% accuracy when applied to the testing dataset using the ARBERT model.","PeriodicalId":355149,"journal":{"name":"Workshop on Arabic Natural Language Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Arabic Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.wanlp-1.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Developing a system for sentiment analysis is very challenging for the Arabic language due to the limitations in the available Arabic datasets. Many Arabic dialects are still not studied by researchers in Arabic sentiment analysis due to the complexity of annotators’ recruitment process during dataset creation. This paper covers the research gap in sentiment analysis for the Kuwaiti dialect by proposing a weak supervised approach to develop a large labeled dataset. Our dataset consists of over 16.6k tweets with 7,905 negatives, 7,902 positives, and 860 neutrals that spans several themes and time frames to remove any bias that might affect its content. The annotation agreement between our proposed system’s labels and human-annotated labels reports 93% for the pairwise percent agreement and 0.87 for Cohen’s kappa coefficient. Furthermore, we evaluate our dataset using multiple traditional machine learning classifiers and advanced deep learning language models to test its performance. The results report 89% accuracy when applied to the testing dataset using the ARBERT model.
科威特方言情感分析的弱监督迁移学习方法
由于可用的阿拉伯语数据集的限制,开发一个用于情感分析的系统对阿拉伯语来说是非常具有挑战性的。由于在数据集创建过程中注释者招募过程的复杂性,许多阿拉伯语方言在阿拉伯语情感分析中仍然没有被研究人员研究。本文通过提出一种弱监督方法来开发大型标记数据集,弥补了科威特方言情感分析的研究空白。我们的数据集由超过16.6k条推文组成,其中包括7905条负面推文、7902条正面推文和860条中性推文,这些推文跨越了多个主题和时间框架,以消除可能影响其内容的任何偏见。我们提出的系统标签和人类注释标签之间的标注一致性报告成对百分比一致性为93%,科恩kappa系数为0.87。此外,我们使用多个传统机器学习分类器和高级深度学习语言模型来评估我们的数据集,以测试其性能。当使用ARBERT模型应用于测试数据集时,结果报告准确率为89%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信