{"title":"使用递归神经网络识别作者","authors":"Shriya T. P. Gupta, J. Sahoo, R. Roul","doi":"10.1145/3325917.3325935","DOIUrl":null,"url":null,"abstract":"Authorship identification is the process of revealing the hidden identity of authors from a corpus of literary data based on a stylometric analysis of the text. It has essential applications in various fields, such as cyber-forensics, plagiarism detection, and political socialization. This paper aims to use a deep learning approach for the task of authorship identification by defining a suitable characterization of texts to capture the distinctive style of an author. The proposed model uses an index based word embedding for the C50 and the BBC datasets, applied to the input data of article level Long Short Term Memory (LSTM) network and Gated Recurrent Unit (GRU) network models. A comparative study of this new variant of embeddings is done with the standard approach of pre-trained word embeddings.","PeriodicalId":249061,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Information System and Data Mining","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Authorship Identification using Recurrent Neural Networks\",\"authors\":\"Shriya T. P. Gupta, J. Sahoo, R. Roul\",\"doi\":\"10.1145/3325917.3325935\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Authorship identification is the process of revealing the hidden identity of authors from a corpus of literary data based on a stylometric analysis of the text. It has essential applications in various fields, such as cyber-forensics, plagiarism detection, and political socialization. This paper aims to use a deep learning approach for the task of authorship identification by defining a suitable characterization of texts to capture the distinctive style of an author. The proposed model uses an index based word embedding for the C50 and the BBC datasets, applied to the input data of article level Long Short Term Memory (LSTM) network and Gated Recurrent Unit (GRU) network models. A comparative study of this new variant of embeddings is done with the standard approach of pre-trained word embeddings.\",\"PeriodicalId\":249061,\"journal\":{\"name\":\"Proceedings of the 2019 3rd International Conference on Information System and Data Mining\",\"volume\":\"63 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2019 3rd International Conference on Information System and Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3325917.3325935\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 3rd International Conference on Information System and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3325917.3325935","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Authorship Identification using Recurrent Neural Networks
Authorship identification is the process of revealing the hidden identity of authors from a corpus of literary data based on a stylometric analysis of the text. It has essential applications in various fields, such as cyber-forensics, plagiarism detection, and political socialization. This paper aims to use a deep learning approach for the task of authorship identification by defining a suitable characterization of texts to capture the distinctive style of an author. The proposed model uses an index based word embedding for the C50 and the BBC datasets, applied to the input data of article level Long Short Term Memory (LSTM) network and Gated Recurrent Unit (GRU) network models. A comparative study of this new variant of embeddings is done with the standard approach of pre-trained word embeddings.