{"title":"A New Term Weight Scheme and Ensemble Technique for Authorship Identification","authors":"Hanan Alshaher, Jinsheng Xu","doi":"10.1145/3388142.3388159","DOIUrl":null,"url":null,"abstract":"A few of the previous studies on authorship identification have applied term weighting to features. The present study introduced a new term weight scheme, called 1/sigma, that rescales the values of a feature set to a mean of zero and a standard deviation of one. In other words, the 1/sigma scheme standardizes the values of a feature set. Three experiments showed the robustness of the proposed term weight scheme from different perspectives. These experiments showed that the proposed term weight scheme worked perfectly with different feature sets and classifiers in comparison to two popular term weight scemes: TF and TF-IDF. Furthermore, 1/sigma was shown to work successfully with the following different types of datasets: literary texts (fiction) and online messages (blogs, emails, and tweets). Although these experiments did not directly examine the effects of the numbers of documents and authors, the results indicated that these factors did not have any effects because the numbers of documents and authors vary from dataset to dataset.","PeriodicalId":409298,"journal":{"name":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3388142.3388159","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
A few of the previous studies on authorship identification have applied term weighting to features. The present study introduced a new term weight scheme, called 1/sigma, that rescales the values of a feature set to a mean of zero and a standard deviation of one. In other words, the 1/sigma scheme standardizes the values of a feature set. Three experiments showed the robustness of the proposed term weight scheme from different perspectives. These experiments showed that the proposed term weight scheme worked perfectly with different feature sets and classifiers in comparison to two popular term weight scemes: TF and TF-IDF. Furthermore, 1/sigma was shown to work successfully with the following different types of datasets: literary texts (fiction) and online messages (blogs, emails, and tweets). Although these experiments did not directly examine the effects of the numbers of documents and authors, the results indicated that these factors did not have any effects because the numbers of documents and authors vary from dataset to dataset.