Mohammed Al-Sarem, Walid Cherif, Ahmed Abdel Wahab, Abdel-Hamid M. Emara, M. Kissi
{"title":"基于体裁特征和基于频次特征的阿拉伯语短文本作者识别","authors":"Mohammed Al-Sarem, Walid Cherif, Ahmed Abdel Wahab, Abdel-Hamid M. Emara, M. Kissi","doi":"10.1145/3289402.3289500","DOIUrl":null,"url":null,"abstract":"Authorship verification (AV) is a binary classification task which aims at verifying whether a given text is written by a specific author. In terms of Arabic language, this task is poorly addressed especially with short texts. The current study examines the performance of authorship verifications in the context of short Arabic documents. The Bagging classifier is applied on two different datasets. First, a balanced dataset is examined with different features combinations. In terms of authorship features, two features types are used: stylo-based features (SF) and frequency-based features (FF). And secondly, the same experiment is conducted with an unbalanced dataset.","PeriodicalId":199959,"journal":{"name":"Proceedings of the 12th International Conference on Intelligent Systems: Theories and Applications","volume":"382 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Combination of Stylo-based Features and Frequency-based Features for Identifying the Author of Short Arabic Text\",\"authors\":\"Mohammed Al-Sarem, Walid Cherif, Ahmed Abdel Wahab, Abdel-Hamid M. Emara, M. Kissi\",\"doi\":\"10.1145/3289402.3289500\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Authorship verification (AV) is a binary classification task which aims at verifying whether a given text is written by a specific author. In terms of Arabic language, this task is poorly addressed especially with short texts. The current study examines the performance of authorship verifications in the context of short Arabic documents. The Bagging classifier is applied on two different datasets. First, a balanced dataset is examined with different features combinations. In terms of authorship features, two features types are used: stylo-based features (SF) and frequency-based features (FF). And secondly, the same experiment is conducted with an unbalanced dataset.\",\"PeriodicalId\":199959,\"journal\":{\"name\":\"Proceedings of the 12th International Conference on Intelligent Systems: Theories and Applications\",\"volume\":\"382 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 12th International Conference on Intelligent Systems: Theories and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3289402.3289500\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 12th International Conference on Intelligent Systems: Theories and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3289402.3289500","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Combination of Stylo-based Features and Frequency-based Features for Identifying the Author of Short Arabic Text
Authorship verification (AV) is a binary classification task which aims at verifying whether a given text is written by a specific author. In terms of Arabic language, this task is poorly addressed especially with short texts. The current study examines the performance of authorship verifications in the context of short Arabic documents. The Bagging classifier is applied on two different datasets. First, a balanced dataset is examined with different features combinations. In terms of authorship features, two features types are used: stylo-based features (SF) and frequency-based features (FF). And secondly, the same experiment is conducted with an unbalanced dataset.