Hemayet Ahmed Chowdhury, Md. Azizul Haque Imon, Md Saiful Islam
{"title":"Authorship Attribution in Bengali Literature Using fastText's Hierarchical Classifier","authors":"Hemayet Ahmed Chowdhury, Md. Azizul Haque Imon, Md Saiful Islam","doi":"10.1109/CEEICT.2018.8628109","DOIUrl":null,"url":null,"abstract":"Authorship Attribution concerns with the classification problem of identifying the original author of a piece of text by analyzing other literary works of the same author. A substantial amount of approaches have been taken to solve this problem in Bengali Literature, including machine learning techniques and natural language processing methods. Most of the classification approaches, especially the machine learning methods, have a high computational requirement and time complexity for model training and testing whenever a large amount of data is to be processed. This restricts their uses in everyday electronic devices and computers with lower system specifications. In this paper, we take a different approach in exploring the classification problem using Facebook's open source fastText technology, specifically due to its low computational power requirement and faster model training, to check the possibility of its use in applications in regular computers and mobile systems. Our approach with fastText provided us with convincing results, outperforming traditional machine learning based classifiers, m ore specifically, Naive Bayes' accuracy whenever longer length n-grams were taken as feature sets. FastText also trained its classification models much faster than those of Support Vector Machines (SVM) and Naive Bayes.","PeriodicalId":417359,"journal":{"name":"2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEEICT.2018.8628109","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Authorship Attribution concerns with the classification problem of identifying the original author of a piece of text by analyzing other literary works of the same author. A substantial amount of approaches have been taken to solve this problem in Bengali Literature, including machine learning techniques and natural language processing methods. Most of the classification approaches, especially the machine learning methods, have a high computational requirement and time complexity for model training and testing whenever a large amount of data is to be processed. This restricts their uses in everyday electronic devices and computers with lower system specifications. In this paper, we take a different approach in exploring the classification problem using Facebook's open source fastText technology, specifically due to its low computational power requirement and faster model training, to check the possibility of its use in applications in regular computers and mobile systems. Our approach with fastText provided us with convincing results, outperforming traditional machine learning based classifiers, m ore specifically, Naive Bayes' accuracy whenever longer length n-grams were taken as feature sets. FastText also trained its classification models much faster than those of Support Vector Machines (SVM) and Naive Bayes.