An empirical evaluation of ensemble bagging-based model for authorship attribution on Twitter

2021 Fifth International Conference On Intelligent Computing in Data Sciences (ICDS) Pub Date : 2021-10-20 DOI:10.1109/ICDS53782.2021.9626735

Anoual El Kah, Imad Zeroual

{"title":"An empirical evaluation of ensemble bagging-based model for authorship attribution on Twitter","authors":"Anoual El Kah, Imad Zeroual","doi":"10.1109/ICDS53782.2021.9626735","DOIUrl":null,"url":null,"abstract":"Authorship Attribution (AA) of short texts like SMS, chat, social media posts has become a relevant study issue, adding new dimensions to this field. However, AA of Arabic Tweets is not well-investigated and left behind compared to longer texts such as ancient books, poems, news articles, or even similar short text like the fatwa (i.e., a legal decree in the religion of Islam). This paper presents the advantage of using a bagging ensemble model over a single learner model to increase the accuracy of AA of Arabic tweets. In doing so, we evaluated the performance of a bagging ensemble model using three state-of-the-art classification approaches as base classifiers, namely Naïve Bayesian (NB), Support Vector Machines (SVM), and Decision Trees (DT). According to the experiments conducted, the proposed bagging classifier that used the SVM algorithm as a base model achieved the highest accuracy rate (i.e., 95,03%) among the other classifiers. This accuracy is among the highest ever published in similar studies.","PeriodicalId":351746,"journal":{"name":"2021 Fifth International Conference On Intelligent Computing in Data Sciences (ICDS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Fifth International Conference On Intelligent Computing in Data Sciences (ICDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDS53782.2021.9626735","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Authorship Attribution (AA) of short texts like SMS, chat, social media posts has become a relevant study issue, adding new dimensions to this field. However, AA of Arabic Tweets is not well-investigated and left behind compared to longer texts such as ancient books, poems, news articles, or even similar short text like the fatwa (i.e., a legal decree in the religion of Islam). This paper presents the advantage of using a bagging ensemble model over a single learner model to increase the accuracy of AA of Arabic tweets. In doing so, we evaluated the performance of a bagging ensemble model using three state-of-the-art classification approaches as base classifiers, namely Naïve Bayesian (NB), Support Vector Machines (SVM), and Decision Trees (DT). According to the experiments conducted, the proposed bagging classifier that used the SVM algorithm as a base model achieved the highest accuracy rate (i.e., 95,03%) among the other classifiers. This accuracy is among the highest ever published in similar studies.

查看原文本刊更多论文

基于集合bagging的Twitter作者归属模型的实证评价

短文本(如短信、聊天、社交媒体帖子)的作者归属(AA)已经成为一个相关的研究问题，为这一领域增添了新的维度。然而，与古书、诗歌、新闻文章等较长的文本，甚至类似的短文本，如fatwa(即伊斯兰教的法律法令)相比，阿拉伯语Tweets的AA并没有得到很好的研究，也没有得到很好的研究。本文介绍了使用bagging集成模型相对于单个学习器模型的优势，以提高阿拉伯语推文的AA精度。在此过程中，我们使用三种最先进的分类方法作为基本分类器，即Naïve贝叶斯(NB)，支持向量机(SVM)和决策树(DT)，评估了bagging集成模型的性能。实验表明，本文提出的以SVM算法为基础模型的bagging分类器准确率最高，达到95,03%。这是同类研究中准确率最高的研究之一。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 Fifth International Conference On Intelligent Computing in Data Sciences (ICDS)

自引率

0.00%

发文量