孟加拉语文学作者归属中的词嵌入表征比较分析

2018 21st International Conference of Computer and Information Technology (ICCIT) Pub Date : 2018-12-01 DOI:10.1109/ICCITECHN.2018.8631977

Hemayet Ahmed Chowdhury, Md. Azizul Haque Imon, Md Saiful Islam

{"title":"孟加拉语文学作者归属中的词嵌入表征比较分析","authors":"Hemayet Ahmed Chowdhury, Md. Azizul Haque Imon, Md Saiful Islam","doi":"10.1109/ICCITECHN.2018.8631977","DOIUrl":null,"url":null,"abstract":"Word Embeddings can be used by deep layers of neural networks to extract features from them to learn stylo-metric patterns of authors based on context and co-occurrence of the words in the field of Authorship Attribution. In this paper, we investigate the effects of different types of word embeddings in Authorship Attribution of Bengali Literature, specifically the skip-gram and continuous-bag-of-words(CBOW) models generated by Word2Vec and fastText along with the word vectors generated by Glove. We experiment with dense neural network models, such as the convolutional and recurrent neural networks and analyse how different word embedding models effect the performance of the classifiers and discuss their properties in this classification task of Authorship Attribution of Bengali Literature. The experiments are performed on a data set we prepared, consisting of 2400 on-line blog articles from 6 authors of recent times.","PeriodicalId":355984,"journal":{"name":"2018 21st International Conference of Computer and Information Technology (ICCIT)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"A Comparative Analysis of Word Embedding Representations in Authorship Attribution of Bengali Literature\",\"authors\":\"Hemayet Ahmed Chowdhury, Md. Azizul Haque Imon, Md Saiful Islam\",\"doi\":\"10.1109/ICCITECHN.2018.8631977\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Word Embeddings can be used by deep layers of neural networks to extract features from them to learn stylo-metric patterns of authors based on context and co-occurrence of the words in the field of Authorship Attribution. In this paper, we investigate the effects of different types of word embeddings in Authorship Attribution of Bengali Literature, specifically the skip-gram and continuous-bag-of-words(CBOW) models generated by Word2Vec and fastText along with the word vectors generated by Glove. We experiment with dense neural network models, such as the convolutional and recurrent neural networks and analyse how different word embedding models effect the performance of the classifiers and discuss their properties in this classification task of Authorship Attribution of Bengali Literature. The experiments are performed on a data set we prepared, consisting of 2400 on-line blog articles from 6 authors of recent times.\",\"PeriodicalId\":355984,\"journal\":{\"name\":\"2018 21st International Conference of Computer and Information Technology (ICCIT)\",\"volume\":\"158 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 21st International Conference of Computer and Information Technology (ICCIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCITECHN.2018.8631977\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 21st International Conference of Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCITECHN.2018.8631977","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

摘要

在作者归属领域中，词嵌入可以被深层神经网络用来提取特征，以基于上下文和词的共现来学习作者的风格-度量模式。在本文中，我们研究了不同类型的词嵌入对孟加拉文学作者归属的影响，特别是由Word2Vec和fastText生成的skip-gram和连续词袋(CBOW)模型以及由Glove生成的词向量。我们对卷积和循环神经网络等密集神经网络模型进行了实验，分析了不同的词嵌入模型对分类器性能的影响，并讨论了它们在孟加拉文学作者归属分类任务中的特性。实验是在我们准备的数据集上进行的，该数据集由6位作者最近发表的2400篇在线博客文章组成。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Comparative Analysis of Word Embedding Representations in Authorship Attribution of Bengali Literature

Word Embeddings can be used by deep layers of neural networks to extract features from them to learn stylo-metric patterns of authors based on context and co-occurrence of the words in the field of Authorship Attribution. In this paper, we investigate the effects of different types of word embeddings in Authorship Attribution of Bengali Literature, specifically the skip-gram and continuous-bag-of-words(CBOW) models generated by Word2Vec and fastText along with the word vectors generated by Glove. We experiment with dense neural network models, such as the convolutional and recurrent neural networks and analyse how different word embedding models effect the performance of the classifiers and discuss their properties in this classification task of Authorship Attribution of Bengali Literature. The experiments are performed on a data set we prepared, consisting of 2400 on-line blog articles from 6 authors of recent times.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 21st International Conference of Computer and Information Technology (ICCIT)

自引率

0.00%

发文量