Identifying Author in Bengali Literature by Bi-LSTM with Attention Mechanism

Ibrahim Al Azhar, Sohel Ahmed, Md Saiful Islam, Aisha Khatun
{"title":"Identifying Author in Bengali Literature by Bi-LSTM with Attention Mechanism","authors":"Ibrahim Al Azhar, Sohel Ahmed, Md Saiful Islam, Aisha Khatun","doi":"10.1109/ICCIT54785.2021.9689840","DOIUrl":null,"url":null,"abstract":"Authorship Attribution is the task of determining the author of an unknown text using one’s writing patterns. It is a well-established task for high-resource languages like English, but it is challenging for low-resource languages like Bengali. In this paper, we propose a Bi-directional Long Short Term Memory(Bi-LSTM) model with self-attention mechanism to address this problem. GloVe embedding vectors encode the semantic and syntactic knowledge of words, which are then fed into the Bi-LSTM models. Moreover, attention mechanism enhances the model’s ability to learn the complex linguistics patterns through learnable parameters, which gives lower weights to common words and higher weights to keywords that capture an author’s stylistic components. It improves performance extract contextual features. We evaluate our model on multiple datasets and experiment with various architectures. Our proposed model outperforms the state-of-the-art model by 12.14%-20.24% in the BAAD6 author dataset, 1.05% - 7.34% in the BAAD16 author dataset, with best performance accuracy of 97.99%. The experimental results demonstrate that the Bi-LSTM model’s attention mechanism notably boosts performance. (The source code are shared as free tools at https://github.com/IbrahimAlAzhar/AuthorshipAttribution)","PeriodicalId":166450,"journal":{"name":"2021 24th International Conference on Computer and Information Technology (ICCIT)","volume":"182 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 24th International Conference on Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIT54785.2021.9689840","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Authorship Attribution is the task of determining the author of an unknown text using one’s writing patterns. It is a well-established task for high-resource languages like English, but it is challenging for low-resource languages like Bengali. In this paper, we propose a Bi-directional Long Short Term Memory(Bi-LSTM) model with self-attention mechanism to address this problem. GloVe embedding vectors encode the semantic and syntactic knowledge of words, which are then fed into the Bi-LSTM models. Moreover, attention mechanism enhances the model’s ability to learn the complex linguistics patterns through learnable parameters, which gives lower weights to common words and higher weights to keywords that capture an author’s stylistic components. It improves performance extract contextual features. We evaluate our model on multiple datasets and experiment with various architectures. Our proposed model outperforms the state-of-the-art model by 12.14%-20.24% in the BAAD6 author dataset, 1.05% - 7.34% in the BAAD16 author dataset, with best performance accuracy of 97.99%. The experimental results demonstrate that the Bi-LSTM model’s attention mechanism notably boosts performance. (The source code are shared as free tools at https://github.com/IbrahimAlAzhar/AuthorshipAttribution)
基于注意机制的Bi-LSTM识别孟加拉文学作者
作者归属是使用一个人的写作模式来确定未知文本的作者的任务。对于像英语这样资源丰富的语言来说,这是一个既定的任务,但对于像孟加拉语这样资源贫乏的语言来说,这是一个挑战。本文提出了一种具有自注意机制的双向长短期记忆模型来解决这一问题。GloVe嵌入向量对单词的语义和句法知识进行编码,然后将其输入到Bi-LSTM模型中。此外,注意机制通过可学习的参数增强了模型学习复杂语言模式的能力,对常用词赋予较低的权重,对捕捉作者文体成分的关键词赋予较高的权重。它提高了提取上下文特性的性能。我们在多个数据集上评估我们的模型,并在各种架构上进行实验。该模型在BAAD6作者数据集上的性能优于现有模型12.14% ~ 20.24%,在BAAD16作者数据集上的性能优于现有模型1.05% ~ 7.34%,最佳性能准确率为97.99%。实验结果表明,Bi-LSTM模型的注意机制显著提高了性能。(源代码作为免费工具在https://github.com/IbrahimAlAzhar/AuthorshipAttribution上共享)
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信