孟加拉语文学作者归属中的词嵌入表征比较分析

Hemayet Ahmed Chowdhury, Md. Azizul Haque Imon, Md Saiful Islam
{"title":"孟加拉语文学作者归属中的词嵌入表征比较分析","authors":"Hemayet Ahmed Chowdhury, Md. Azizul Haque Imon, Md Saiful Islam","doi":"10.1109/ICCITECHN.2018.8631977","DOIUrl":null,"url":null,"abstract":"Word Embeddings can be used by deep layers of neural networks to extract features from them to learn stylo-metric patterns of authors based on context and co-occurrence of the words in the field of Authorship Attribution. In this paper, we investigate the effects of different types of word embeddings in Authorship Attribution of Bengali Literature, specifically the skip-gram and continuous-bag-of-words(CBOW) models generated by Word2Vec and fastText along with the word vectors generated by Glove. We experiment with dense neural network models, such as the convolutional and recurrent neural networks and analyse how different word embedding models effect the performance of the classifiers and discuss their properties in this classification task of Authorship Attribution of Bengali Literature. The experiments are performed on a data set we prepared, consisting of 2400 on-line blog articles from 6 authors of recent times.","PeriodicalId":355984,"journal":{"name":"2018 21st International Conference of Computer and Information Technology (ICCIT)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"A Comparative Analysis of Word Embedding Representations in Authorship Attribution of Bengali Literature\",\"authors\":\"Hemayet Ahmed Chowdhury, Md. Azizul Haque Imon, Md Saiful Islam\",\"doi\":\"10.1109/ICCITECHN.2018.8631977\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Word Embeddings can be used by deep layers of neural networks to extract features from them to learn stylo-metric patterns of authors based on context and co-occurrence of the words in the field of Authorship Attribution. In this paper, we investigate the effects of different types of word embeddings in Authorship Attribution of Bengali Literature, specifically the skip-gram and continuous-bag-of-words(CBOW) models generated by Word2Vec and fastText along with the word vectors generated by Glove. We experiment with dense neural network models, such as the convolutional and recurrent neural networks and analyse how different word embedding models effect the performance of the classifiers and discuss their properties in this classification task of Authorship Attribution of Bengali Literature. The experiments are performed on a data set we prepared, consisting of 2400 on-line blog articles from 6 authors of recent times.\",\"PeriodicalId\":355984,\"journal\":{\"name\":\"2018 21st International Conference of Computer and Information Technology (ICCIT)\",\"volume\":\"158 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 21st International Conference of Computer and Information Technology (ICCIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCITECHN.2018.8631977\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 21st International Conference of Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCITECHN.2018.8631977","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

摘要

在作者归属领域中,词嵌入可以被深层神经网络用来提取特征,以基于上下文和词的共现来学习作者的风格-度量模式。在本文中,我们研究了不同类型的词嵌入对孟加拉文学作者归属的影响,特别是由Word2Vec和fastText生成的skip-gram和连续词袋(CBOW)模型以及由Glove生成的词向量。我们对卷积和循环神经网络等密集神经网络模型进行了实验,分析了不同的词嵌入模型对分类器性能的影响,并讨论了它们在孟加拉文学作者归属分类任务中的特性。实验是在我们准备的数据集上进行的,该数据集由6位作者最近发表的2400篇在线博客文章组成。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Comparative Analysis of Word Embedding Representations in Authorship Attribution of Bengali Literature
Word Embeddings can be used by deep layers of neural networks to extract features from them to learn stylo-metric patterns of authors based on context and co-occurrence of the words in the field of Authorship Attribution. In this paper, we investigate the effects of different types of word embeddings in Authorship Attribution of Bengali Literature, specifically the skip-gram and continuous-bag-of-words(CBOW) models generated by Word2Vec and fastText along with the word vectors generated by Glove. We experiment with dense neural network models, such as the convolutional and recurrent neural networks and analyse how different word embedding models effect the performance of the classifiers and discuss their properties in this classification task of Authorship Attribution of Bengali Literature. The experiments are performed on a data set we prepared, consisting of 2400 on-line blog articles from 6 authors of recent times.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信