The Role of Traditional Features in Authorship Attribution

2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC) Pub Date : 2020-07-01 DOI:10.1109/ICEIEC49280.2020.9152360

L. Shang, Lizhen Liu, Wei Song, Miaomiao Cheng

引用次数: 0

Abstract

As an important direction of natural language processing, authorship attribution has been paid much attention. Nowadays, the research methods mainly based on neural network and make great progress. However, compared with traditional methods, the interpretability of these methods has certain limitations. It is difficult for us to know which features are specifically used in the classification of neural network models, and the weight distribution of these features, etc. Without an exact understanding of the model, it is difficult for us to use it in key fields. We used 162 manually defined features at 5 levels and n-grams features at the character level to conduct authorship attribution experiment on the Chinese data set, and conducted a comparative study on these features to obtain the features that the model plays an important role in authorship attribution of Chinese text.

查看原文本刊更多论文

传统特征在作者归属中的作用

作者归属作为自然语言处理的一个重要方向，受到了广泛的关注。目前，基于神经网络的研究方法取得了很大的进展。但与传统方法相比，这些方法的可解释性存在一定的局限性。我们很难知道在神经网络模型的分类中具体使用了哪些特征，以及这些特征的权重分布等。没有对模型的准确理解，我们很难在关键领域使用它。我们使用5个层次的162个人工定义特征和字符层次的n-grams特征对中文数据集进行了作者归属实验，并对这些特征进行了对比研究，得到了该模型在中文文本作者归属中发挥重要作用的特征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC)

自引率

0.00%

发文量