{"title":"An empirical evaluation of ensemble bagging-based model for authorship attribution on Twitter","authors":"Anoual El Kah, Imad Zeroual","doi":"10.1109/ICDS53782.2021.9626735","DOIUrl":null,"url":null,"abstract":"Authorship Attribution (AA) of short texts like SMS, chat, social media posts has become a relevant study issue, adding new dimensions to this field. However, AA of Arabic Tweets is not well-investigated and left behind compared to longer texts such as ancient books, poems, news articles, or even similar short text like the fatwa (i.e., a legal decree in the religion of Islam). This paper presents the advantage of using a bagging ensemble model over a single learner model to increase the accuracy of AA of Arabic tweets. In doing so, we evaluated the performance of a bagging ensemble model using three state-of-the-art classification approaches as base classifiers, namely Naïve Bayesian (NB), Support Vector Machines (SVM), and Decision Trees (DT). According to the experiments conducted, the proposed bagging classifier that used the SVM algorithm as a base model achieved the highest accuracy rate (i.e., 95,03%) among the other classifiers. This accuracy is among the highest ever published in similar studies.","PeriodicalId":351746,"journal":{"name":"2021 Fifth International Conference On Intelligent Computing in Data Sciences (ICDS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Fifth International Conference On Intelligent Computing in Data Sciences (ICDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDS53782.2021.9626735","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Authorship Attribution (AA) of short texts like SMS, chat, social media posts has become a relevant study issue, adding new dimensions to this field. However, AA of Arabic Tweets is not well-investigated and left behind compared to longer texts such as ancient books, poems, news articles, or even similar short text like the fatwa (i.e., a legal decree in the religion of Islam). This paper presents the advantage of using a bagging ensemble model over a single learner model to increase the accuracy of AA of Arabic tweets. In doing so, we evaluated the performance of a bagging ensemble model using three state-of-the-art classification approaches as base classifiers, namely Naïve Bayesian (NB), Support Vector Machines (SVM), and Decision Trees (DT). According to the experiments conducted, the proposed bagging classifier that used the SVM algorithm as a base model achieved the highest accuracy rate (i.e., 95,03%) among the other classifiers. This accuracy is among the highest ever published in similar studies.