Authorship Attribution in Bengali Literature Using fastText's Hierarchical Classifier

2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT) Pub Date : 2018-09-01 DOI:10.1109/CEEICT.2018.8628109

Hemayet Ahmed Chowdhury, Md. Azizul Haque Imon, Md Saiful Islam

{"title":"Authorship Attribution in Bengali Literature Using fastText's Hierarchical Classifier","authors":"Hemayet Ahmed Chowdhury, Md. Azizul Haque Imon, Md Saiful Islam","doi":"10.1109/CEEICT.2018.8628109","DOIUrl":null,"url":null,"abstract":"Authorship Attribution concerns with the classification problem of identifying the original author of a piece of text by analyzing other literary works of the same author. A substantial amount of approaches have been taken to solve this problem in Bengali Literature, including machine learning techniques and natural language processing methods. Most of the classification approaches, especially the machine learning methods, have a high computational requirement and time complexity for model training and testing whenever a large amount of data is to be processed. This restricts their uses in everyday electronic devices and computers with lower system specifications. In this paper, we take a different approach in exploring the classification problem using Facebook's open source fastText technology, specifically due to its low computational power requirement and faster model training, to check the possibility of its use in applications in regular computers and mobile systems. Our approach with fastText provided us with convincing results, outperforming traditional machine learning based classifiers, m ore specifically, Naive Bayes' accuracy whenever longer length n-grams were taken as feature sets. FastText also trained its classification models much faster than those of Support Vector Machines (SVM) and Naive Bayes.","PeriodicalId":417359,"journal":{"name":"2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEEICT.2018.8628109","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Authorship Attribution concerns with the classification problem of identifying the original author of a piece of text by analyzing other literary works of the same author. A substantial amount of approaches have been taken to solve this problem in Bengali Literature, including machine learning techniques and natural language processing methods. Most of the classification approaches, especially the machine learning methods, have a high computational requirement and time complexity for model training and testing whenever a large amount of data is to be processed. This restricts their uses in everyday electronic devices and computers with lower system specifications. In this paper, we take a different approach in exploring the classification problem using Facebook's open source fastText technology, specifically due to its low computational power requirement and faster model training, to check the possibility of its use in applications in regular computers and mobile systems. Our approach with fastText provided us with convincing results, outperforming traditional machine learning based classifiers, m ore specifically, Naive Bayes' accuracy whenever longer length n-grams were taken as feature sets. FastText also trained its classification models much faster than those of Support Vector Machines (SVM) and Naive Bayes.

查看原文本刊更多论文

用fastText分层分类器分析孟加拉文学的作者归属

作者归属是指通过分析同一作者的其他文学作品来确定一篇文章的原作者的分类问题。在孟加拉文学中，已经采取了大量的方法来解决这个问题，包括机器学习技术和自然语言处理方法。大多数分类方法，特别是机器学习方法，在处理大量数据时，对模型训练和测试的计算量和时间复杂度都很高。这限制了它们在日常电子设备和系统规格较低的计算机中的使用。在本文中，我们采用不同的方法来探索使用Facebook的开源fastText技术的分类问题，特别是由于其低计算能力要求和更快的模型训练，以检查其在常规计算机和移动系统应用程序中使用的可能性。我们使用fastText的方法为我们提供了令人信服的结果，优于传统的基于机器学习的分类器，更具体地说，当采用更长的n-gram作为特征集时，朴素贝叶斯的准确性。FastText的分类模型训练速度也比支持向量机(SVM)和朴素贝叶斯(Naive Bayes)快得多。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT)

自引率

0.00%

发文量