Authorship Attribution on Bengali Literature using Stylometric Features and Neural Network

M. Islam, Md. Minhazul Kabir, Md Saiful Islam, Ayesha Tasnim
{"title":"Authorship Attribution on Bengali Literature using Stylometric Features and Neural Network","authors":"M. Islam, Md. Minhazul Kabir, Md Saiful Islam, Ayesha Tasnim","doi":"10.1109/CEEICT.2018.8628106","DOIUrl":null,"url":null,"abstract":"Every writer has his/her personal writing style. In the era of technology, authorship attribution is a big problem in natural language processing because fake writers can publish other writers' contents and it is difficult to identify the real author. Various kinds of features such as frequently used words, word length, sentence length, WH words, Number, etc. were analyzed to identify and specify a writer’s writing style. A statistical analysis of different articles by different writers was created that can identify the real author. An artificial neural network model was developed to identify a writer from an unknown document and it achieved above 85% accuracy rate for each writer. In this article, writings of five Bangladeshi authors named Imon Jubayer (IJ), Humayun Ahmed (HA), Muhammed Zafar Iqbal (MZI), Kazi Nazrul Islam (KNI) and Hassan Mahbub (HM) are observed.","PeriodicalId":417359,"journal":{"name":"2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEEICT.2018.8628106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Every writer has his/her personal writing style. In the era of technology, authorship attribution is a big problem in natural language processing because fake writers can publish other writers' contents and it is difficult to identify the real author. Various kinds of features such as frequently used words, word length, sentence length, WH words, Number, etc. were analyzed to identify and specify a writer’s writing style. A statistical analysis of different articles by different writers was created that can identify the real author. An artificial neural network model was developed to identify a writer from an unknown document and it achieved above 85% accuracy rate for each writer. In this article, writings of five Bangladeshi authors named Imon Jubayer (IJ), Humayun Ahmed (HA), Muhammed Zafar Iqbal (MZI), Kazi Nazrul Islam (KNI) and Hassan Mahbub (HM) are observed.
基于语体特征和神经网络的孟加拉文学作者归属研究
每个作家都有他/她个人的写作风格。在科技时代,作者归属是自然语言处理中的一个大问题,因为假作者可以发表其他作者的内容,很难识别真正的作者。通过分析常用词、词长、句长、WH词、Number等各种特征来识别和指定作者的写作风格。对不同作者的不同文章进行统计分析,可以识别真正的作者。建立了一种人工神经网络模型,从未知文档中识别作者,每个作者的准确率达到85%以上。在这篇文章中,五位孟加拉国作家Imon Jubayer (IJ), Humayun Ahmed (HA), Muhammed Zafar Iqbal (MZI), Kazi Nazrul Islam (KNI)和Hassan Mahbub (HM)的作品被观察到。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信