An empirical evaluation of ensemble bagging-based model for authorship attribution on Twitter

Anoual El Kah, Imad Zeroual
{"title":"An empirical evaluation of ensemble bagging-based model for authorship attribution on Twitter","authors":"Anoual El Kah, Imad Zeroual","doi":"10.1109/ICDS53782.2021.9626735","DOIUrl":null,"url":null,"abstract":"Authorship Attribution (AA) of short texts like SMS, chat, social media posts has become a relevant study issue, adding new dimensions to this field. However, AA of Arabic Tweets is not well-investigated and left behind compared to longer texts such as ancient books, poems, news articles, or even similar short text like the fatwa (i.e., a legal decree in the religion of Islam). This paper presents the advantage of using a bagging ensemble model over a single learner model to increase the accuracy of AA of Arabic tweets. In doing so, we evaluated the performance of a bagging ensemble model using three state-of-the-art classification approaches as base classifiers, namely Naïve Bayesian (NB), Support Vector Machines (SVM), and Decision Trees (DT). According to the experiments conducted, the proposed bagging classifier that used the SVM algorithm as a base model achieved the highest accuracy rate (i.e., 95,03%) among the other classifiers. This accuracy is among the highest ever published in similar studies.","PeriodicalId":351746,"journal":{"name":"2021 Fifth International Conference On Intelligent Computing in Data Sciences (ICDS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Fifth International Conference On Intelligent Computing in Data Sciences (ICDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDS53782.2021.9626735","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Authorship Attribution (AA) of short texts like SMS, chat, social media posts has become a relevant study issue, adding new dimensions to this field. However, AA of Arabic Tweets is not well-investigated and left behind compared to longer texts such as ancient books, poems, news articles, or even similar short text like the fatwa (i.e., a legal decree in the religion of Islam). This paper presents the advantage of using a bagging ensemble model over a single learner model to increase the accuracy of AA of Arabic tweets. In doing so, we evaluated the performance of a bagging ensemble model using three state-of-the-art classification approaches as base classifiers, namely Naïve Bayesian (NB), Support Vector Machines (SVM), and Decision Trees (DT). According to the experiments conducted, the proposed bagging classifier that used the SVM algorithm as a base model achieved the highest accuracy rate (i.e., 95,03%) among the other classifiers. This accuracy is among the highest ever published in similar studies.
基于集合bagging的Twitter作者归属模型的实证评价
短文本(如短信、聊天、社交媒体帖子)的作者归属(AA)已经成为一个相关的研究问题,为这一领域增添了新的维度。然而,与古书、诗歌、新闻文章等较长的文本,甚至类似的短文本,如fatwa(即伊斯兰教的法律法令)相比,阿拉伯语Tweets的AA并没有得到很好的研究,也没有得到很好的研究。本文介绍了使用bagging集成模型相对于单个学习器模型的优势,以提高阿拉伯语推文的AA精度。在此过程中,我们使用三种最先进的分类方法作为基本分类器,即Naïve贝叶斯(NB),支持向量机(SVM)和决策树(DT),评估了bagging集成模型的性能。实验表明,本文提出的以SVM算法为基础模型的bagging分类器准确率最高,达到95,03%。这是同类研究中准确率最高的研究之一。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信