Ankita Dhar, Himadri Mukherjee, Niladri Sekhar Dash, K. Roy
{"title":"Performance of Classifiers in Bangla Text Categorization","authors":"Ankita Dhar, Himadri Mukherjee, Niladri Sekhar Dash, K. Roy","doi":"10.1109/ICISET.2018.8745621","DOIUrl":null,"url":null,"abstract":"Automated text categorization or text classification has become an important text mining task especially with the speedy development and increase of the number of on-line documents. Automatic text classification system aims to assign the text documents to their predefined categories based on some linguistic characteristics. Although research has progressed significantly for languages like English, Arabic, Chinese, etc., there has not been much development for the Indian Languages especially for Bangla which is one of the most popular languages of India and Bangladesh. One reason for this is the inherent complexity of Bangla which is accompanied by the unavailability of standard datasets and resources. In this paper, the performance of different classifiers is presented for the task of text classification based on ‘term association’ and ‘term aggregation’ feature extraction methods and an accuracy of 98.68% has been obtained on dataset of 8000 Bangla text documents procured from various web sources.","PeriodicalId":6608,"journal":{"name":"2018 International Conference on Innovations in Science, Engineering and Technology (ICISET)","volume":"1 1","pages":"168-173"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Innovations in Science, Engineering and Technology (ICISET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICISET.2018.8745621","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Automated text categorization or text classification has become an important text mining task especially with the speedy development and increase of the number of on-line documents. Automatic text classification system aims to assign the text documents to their predefined categories based on some linguistic characteristics. Although research has progressed significantly for languages like English, Arabic, Chinese, etc., there has not been much development for the Indian Languages especially for Bangla which is one of the most popular languages of India and Bangladesh. One reason for this is the inherent complexity of Bangla which is accompanied by the unavailability of standard datasets and resources. In this paper, the performance of different classifiers is presented for the task of text classification based on ‘term association’ and ‘term aggregation’ feature extraction methods and an accuracy of 98.68% has been obtained on dataset of 8000 Bangla text documents procured from various web sources.