{"title":"The Role of Traditional Features in Authorship Attribution","authors":"L. Shang, Lizhen Liu, Wei Song, Miaomiao Cheng","doi":"10.1109/ICEIEC49280.2020.9152360","DOIUrl":null,"url":null,"abstract":"As an important direction of natural language processing, authorship attribution has been paid much attention. Nowadays, the research methods mainly based on neural network and make great progress. However, compared with traditional methods, the interpretability of these methods has certain limitations. It is difficult for us to know which features are specifically used in the classification of neural network models, and the weight distribution of these features, etc. Without an exact understanding of the model, it is difficult for us to use it in key fields. We used 162 manually defined features at 5 levels and n-grams features at the character level to conduct authorship attribution experiment on the Chinese data set, and conducted a comparative study on these features to obtain the features that the model plays an important role in authorship attribution of Chinese text.","PeriodicalId":352285,"journal":{"name":"2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC)","volume":"6 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEIEC49280.2020.9152360","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
As an important direction of natural language processing, authorship attribution has been paid much attention. Nowadays, the research methods mainly based on neural network and make great progress. However, compared with traditional methods, the interpretability of these methods has certain limitations. It is difficult for us to know which features are specifically used in the classification of neural network models, and the weight distribution of these features, etc. Without an exact understanding of the model, it is difficult for us to use it in key fields. We used 162 manually defined features at 5 levels and n-grams features at the character level to conduct authorship attribution experiment on the Chinese data set, and conducted a comparative study on these features to obtain the features that the model plays an important role in authorship attribution of Chinese text.