Mohammed Al-Sarem, Walid Cherif, Ahmed Abdel Wahab, Abdel-Hamid M. Emara, M. Kissi
{"title":"Combination of Stylo-based Features and Frequency-based Features for Identifying the Author of Short Arabic Text","authors":"Mohammed Al-Sarem, Walid Cherif, Ahmed Abdel Wahab, Abdel-Hamid M. Emara, M. Kissi","doi":"10.1145/3289402.3289500","DOIUrl":null,"url":null,"abstract":"Authorship verification (AV) is a binary classification task which aims at verifying whether a given text is written by a specific author. In terms of Arabic language, this task is poorly addressed especially with short texts. The current study examines the performance of authorship verifications in the context of short Arabic documents. The Bagging classifier is applied on two different datasets. First, a balanced dataset is examined with different features combinations. In terms of authorship features, two features types are used: stylo-based features (SF) and frequency-based features (FF). And secondly, the same experiment is conducted with an unbalanced dataset.","PeriodicalId":199959,"journal":{"name":"Proceedings of the 12th International Conference on Intelligent Systems: Theories and Applications","volume":"382 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 12th International Conference on Intelligent Systems: Theories and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3289402.3289500","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Authorship verification (AV) is a binary classification task which aims at verifying whether a given text is written by a specific author. In terms of Arabic language, this task is poorly addressed especially with short texts. The current study examines the performance of authorship verifications in the context of short Arabic documents. The Bagging classifier is applied on two different datasets. First, a balanced dataset is examined with different features combinations. In terms of authorship features, two features types are used: stylo-based features (SF) and frequency-based features (FF). And secondly, the same experiment is conducted with an unbalanced dataset.