Hemayet Ahmed Chowdhury, Md. Azizul Haque Imon, Md Saiful Islam
{"title":"孟加拉语文学作者归属中的词嵌入表征比较分析","authors":"Hemayet Ahmed Chowdhury, Md. Azizul Haque Imon, Md Saiful Islam","doi":"10.1109/ICCITECHN.2018.8631977","DOIUrl":null,"url":null,"abstract":"Word Embeddings can be used by deep layers of neural networks to extract features from them to learn stylo-metric patterns of authors based on context and co-occurrence of the words in the field of Authorship Attribution. In this paper, we investigate the effects of different types of word embeddings in Authorship Attribution of Bengali Literature, specifically the skip-gram and continuous-bag-of-words(CBOW) models generated by Word2Vec and fastText along with the word vectors generated by Glove. We experiment with dense neural network models, such as the convolutional and recurrent neural networks and analyse how different word embedding models effect the performance of the classifiers and discuss their properties in this classification task of Authorship Attribution of Bengali Literature. The experiments are performed on a data set we prepared, consisting of 2400 on-line blog articles from 6 authors of recent times.","PeriodicalId":355984,"journal":{"name":"2018 21st International Conference of Computer and Information Technology (ICCIT)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"A Comparative Analysis of Word Embedding Representations in Authorship Attribution of Bengali Literature\",\"authors\":\"Hemayet Ahmed Chowdhury, Md. Azizul Haque Imon, Md Saiful Islam\",\"doi\":\"10.1109/ICCITECHN.2018.8631977\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Word Embeddings can be used by deep layers of neural networks to extract features from them to learn stylo-metric patterns of authors based on context and co-occurrence of the words in the field of Authorship Attribution. In this paper, we investigate the effects of different types of word embeddings in Authorship Attribution of Bengali Literature, specifically the skip-gram and continuous-bag-of-words(CBOW) models generated by Word2Vec and fastText along with the word vectors generated by Glove. We experiment with dense neural network models, such as the convolutional and recurrent neural networks and analyse how different word embedding models effect the performance of the classifiers and discuss their properties in this classification task of Authorship Attribution of Bengali Literature. The experiments are performed on a data set we prepared, consisting of 2400 on-line blog articles from 6 authors of recent times.\",\"PeriodicalId\":355984,\"journal\":{\"name\":\"2018 21st International Conference of Computer and Information Technology (ICCIT)\",\"volume\":\"158 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 21st International Conference of Computer and Information Technology (ICCIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCITECHN.2018.8631977\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 21st International Conference of Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCITECHN.2018.8631977","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Comparative Analysis of Word Embedding Representations in Authorship Attribution of Bengali Literature
Word Embeddings can be used by deep layers of neural networks to extract features from them to learn stylo-metric patterns of authors based on context and co-occurrence of the words in the field of Authorship Attribution. In this paper, we investigate the effects of different types of word embeddings in Authorship Attribution of Bengali Literature, specifically the skip-gram and continuous-bag-of-words(CBOW) models generated by Word2Vec and fastText along with the word vectors generated by Glove. We experiment with dense neural network models, such as the convolutional and recurrent neural networks and analyse how different word embedding models effect the performance of the classifiers and discuss their properties in this classification task of Authorship Attribution of Bengali Literature. The experiments are performed on a data set we prepared, consisting of 2400 on-line blog articles from 6 authors of recent times.