Classifying Arabian Gulf Tweets to Detect People's Trends: A case study

Khaled Balhaf, Omar A. Darwish, Emad Rawashdeh, Mohammad Abu Awad, Dirar A. Darweesh, Yahya M. Tashtoush, Saif Rawashdeh
{"title":"Classifying Arabian Gulf Tweets to Detect People's Trends: A case study","authors":"Khaled Balhaf, Omar A. Darwish, Emad Rawashdeh, Mohammad Abu Awad, Dirar A. Darweesh, Yahya M. Tashtoush, Saif Rawashdeh","doi":"10.1109/SNAMS58071.2022.10062585","DOIUrl":null,"url":null,"abstract":"Recently, media and business companies are utilizing social media to reach a large set of users to maximize the amount of gained profit. Actually, these companies are looking for the best ways to satisfy their user's requirements. It is very difficult to understand these requirements because of the large set of users on social media like Twitter. For this reason, the goal of our research project is to build a classifier that can detect Arabian trends among Gulf area Twitter users. The new built classifier can assist these companies to deliver the convenient products and media contents like photos and videos according to users' trends. By using our own designed Java-based tool, we have collected a significant dataset of tweets. Also, two experiments of tweet classification have been implemented to compare the effects of balanced and imbalanced training data and to measure the effect of data size on the accuracy of classifiers. In both experiments, Support Vector Machine (SVM), K-Nearest Neighbors (KNN) and Naïve Bayes algorithms are used as classifiers. The first experiment uses small, imbalanced data sets and four classes of data, which are Sport, Politics, Islam and Culture. The Light and Root Stemmers were used with each classifier. The best outcome achieved in our research project by utilizing a Naïve Bayes algorithm with the Light Stemmer technique. It achieved an accuracy reaching 76.27%. In the second experiment, we used a balanced large data set with the same classifiers. In addition, we have added one more class to the new data set which is Economics. The experimental results showed that the best accuracy (81.17%) is obtained by using SVM with the Light Stemmer method. The Light Stemmer achieved the best outcomes for all classifiers since almost all of the tweets were written in dialects.","PeriodicalId":371668,"journal":{"name":"2022 Ninth International Conference on Social Networks Analysis, Management and Security (SNAMS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Ninth International Conference on Social Networks Analysis, Management and Security (SNAMS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SNAMS58071.2022.10062585","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Recently, media and business companies are utilizing social media to reach a large set of users to maximize the amount of gained profit. Actually, these companies are looking for the best ways to satisfy their user's requirements. It is very difficult to understand these requirements because of the large set of users on social media like Twitter. For this reason, the goal of our research project is to build a classifier that can detect Arabian trends among Gulf area Twitter users. The new built classifier can assist these companies to deliver the convenient products and media contents like photos and videos according to users' trends. By using our own designed Java-based tool, we have collected a significant dataset of tweets. Also, two experiments of tweet classification have been implemented to compare the effects of balanced and imbalanced training data and to measure the effect of data size on the accuracy of classifiers. In both experiments, Support Vector Machine (SVM), K-Nearest Neighbors (KNN) and Naïve Bayes algorithms are used as classifiers. The first experiment uses small, imbalanced data sets and four classes of data, which are Sport, Politics, Islam and Culture. The Light and Root Stemmers were used with each classifier. The best outcome achieved in our research project by utilizing a Naïve Bayes algorithm with the Light Stemmer technique. It achieved an accuracy reaching 76.27%. In the second experiment, we used a balanced large data set with the same classifiers. In addition, we have added one more class to the new data set which is Economics. The experimental results showed that the best accuracy (81.17%) is obtained by using SVM with the Light Stemmer method. The Light Stemmer achieved the best outcomes for all classifiers since almost all of the tweets were written in dialects.
分类阿拉伯海湾的推文以检测人们的趋势:一个案例研究
最近,媒体和商业公司正在利用社交媒体来接触大量用户,以最大化获得的利润。实际上,这些公司正在寻找最好的方式来满足用户的需求。要理解这些需求是非常困难的,因为像Twitter这样的社交媒体有大量的用户。出于这个原因,我们的研究项目的目标是建立一个分类器,可以检测海湾地区Twitter用户中的阿拉伯趋势。新建立的分类器可以帮助这些公司根据用户的趋势提供方便的产品和照片、视频等媒体内容。通过使用我们自己设计的基于java的工具,我们收集了一个重要的tweet数据集。此外,我们还进行了两个tweet分类实验,比较平衡和不平衡训练数据的效果,以及测量数据大小对分类器准确率的影响。在这两个实验中,使用支持向量机(SVM)、k近邻(KNN)和Naïve贝叶斯算法作为分类器。第一个实验使用小的、不平衡的数据集和四类数据,分别是体育、政治、伊斯兰和文化。每个分类器都使用光和根茎器。在我们的研究项目中,利用Naïve贝叶斯算法和Light Stemmer技术取得了最好的结果。准确率达到76.27%。在第二个实验中,我们使用了具有相同分类器的平衡大数据集。此外,我们在新的数据集中增加了一个类,即经济学。实验结果表明,支持向量机与Light Stemmer方法结合使用可获得最佳的准确率(81.17%)。Light Stemmer在所有分类器中取得了最好的结果,因为几乎所有的推文都是用方言写的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信