M. Islam, Md. Minhazul Kabir, Md Saiful Islam, Ayesha Tasnim
{"title":"Authorship Attribution on Bengali Literature using Stylometric Features and Neural Network","authors":"M. Islam, Md. Minhazul Kabir, Md Saiful Islam, Ayesha Tasnim","doi":"10.1109/CEEICT.2018.8628106","DOIUrl":null,"url":null,"abstract":"Every writer has his/her personal writing style. In the era of technology, authorship attribution is a big problem in natural language processing because fake writers can publish other writers' contents and it is difficult to identify the real author. Various kinds of features such as frequently used words, word length, sentence length, WH words, Number, etc. were analyzed to identify and specify a writer’s writing style. A statistical analysis of different articles by different writers was created that can identify the real author. An artificial neural network model was developed to identify a writer from an unknown document and it achieved above 85% accuracy rate for each writer. In this article, writings of five Bangladeshi authors named Imon Jubayer (IJ), Humayun Ahmed (HA), Muhammed Zafar Iqbal (MZI), Kazi Nazrul Islam (KNI) and Hassan Mahbub (HM) are observed.","PeriodicalId":417359,"journal":{"name":"2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEEICT.2018.8628106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Every writer has his/her personal writing style. In the era of technology, authorship attribution is a big problem in natural language processing because fake writers can publish other writers' contents and it is difficult to identify the real author. Various kinds of features such as frequently used words, word length, sentence length, WH words, Number, etc. were analyzed to identify and specify a writer’s writing style. A statistical analysis of different articles by different writers was created that can identify the real author. An artificial neural network model was developed to identify a writer from an unknown document and it achieved above 85% accuracy rate for each writer. In this article, writings of five Bangladeshi authors named Imon Jubayer (IJ), Humayun Ahmed (HA), Muhammed Zafar Iqbal (MZI), Kazi Nazrul Islam (KNI) and Hassan Mahbub (HM) are observed.
每个作家都有他/她个人的写作风格。在科技时代,作者归属是自然语言处理中的一个大问题,因为假作者可以发表其他作者的内容,很难识别真正的作者。通过分析常用词、词长、句长、WH词、Number等各种特征来识别和指定作者的写作风格。对不同作者的不同文章进行统计分析,可以识别真正的作者。建立了一种人工神经网络模型,从未知文档中识别作者,每个作者的准确率达到85%以上。在这篇文章中,五位孟加拉国作家Imon Jubayer (IJ), Humayun Ahmed (HA), Muhammed Zafar Iqbal (MZI), Kazi Nazrul Islam (KNI)和Hassan Mahbub (HM)的作品被观察到。