Performance Evaluation of Applying N-Gram Based Naïve Bayes Classifier for Hierarchical Classification

2019 3rd International Conference on Computing Methodologies and Communication (ICCMC) Pub Date : 2019-03-01 DOI:10.1109/ICCMC.2019.8819751

J. Shah

{"title":"Performance Evaluation of Applying N-Gram Based Naïve Bayes Classifier for Hierarchical Classification","authors":"J. Shah","doi":"10.1109/ICCMC.2019.8819751","DOIUrl":null,"url":null,"abstract":"Text classification is a process of allocating one or more class label to a text document. If the text classification problem has too many categories, and there are certain categories with less number of training documents, text classification task becomes difficult. Recall will be less for categories with less number of training documents. To handle text classification problem with too many categories and to take into consideration parent-child/sibling relationships between categories in user profile and document profile for content-based filtering, hierarchical classification is better approach. The main issue with hierarchical classification is error propagation. The error that occurs at early level in hierarchy will carry forward to all the levels below it. So, misclassification at early level in hierarchy needs to be reduced. Term ambiguity may be one of the reasons for classification error. Naïve Bayes classification method is mostly used in text classification problem as it takes less time for training and testing. Naïve Bayes model considers that terms are not dependent on each other for a given class. For data where terms are dependent on each other, performance of naïve Bayes is degraded. In this paper, word-level n-gram based Multinomial Naïve Bayes classification method is combined with hierarchical classification to reduce misclassification that occur at early level in hierarchy & improve content-based filtering. Proposed algorithm also suggests a way to reduce execution time requirements for calculating probabilities of terms for n-gram naïve bayes model.","PeriodicalId":232624,"journal":{"name":"2019 3rd International Conference on Computing Methodologies and Communication (ICCMC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 3rd International Conference on Computing Methodologies and Communication (ICCMC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCMC.2019.8819751","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Text classification is a process of allocating one or more class label to a text document. If the text classification problem has too many categories, and there are certain categories with less number of training documents, text classification task becomes difficult. Recall will be less for categories with less number of training documents. To handle text classification problem with too many categories and to take into consideration parent-child/sibling relationships between categories in user profile and document profile for content-based filtering, hierarchical classification is better approach. The main issue with hierarchical classification is error propagation. The error that occurs at early level in hierarchy will carry forward to all the levels below it. So, misclassification at early level in hierarchy needs to be reduced. Term ambiguity may be one of the reasons for classification error. Naïve Bayes classification method is mostly used in text classification problem as it takes less time for training and testing. Naïve Bayes model considers that terms are not dependent on each other for a given class. For data where terms are dependent on each other, performance of naïve Bayes is degraded. In this paper, word-level n-gram based Multinomial Naïve Bayes classification method is combined with hierarchical classification to reduce misclassification that occur at early level in hierarchy & improve content-based filtering. Proposed algorithm also suggests a way to reduce execution time requirements for calculating probabilities of terms for n-gram naïve bayes model.

查看原文本刊更多论文

基于N-Gram的Naïve贝叶斯分类器分层分类性能评价

文本分类是为文本文档分配一个或多个类标签的过程。如果文本分类问题的类别过多，并且某些类别的训练文档数量较少，则文本分类任务变得困难。对于培训文档数量较少的类别，召回率会更低。为了处理类别过多的文本分类问题，并考虑用户概要和文档概要中类别之间的父子/兄弟关系进行基于内容的过滤，分层分类是较好的方法。分层分类的主要问题是错误传播。在层次结构的早期级别发生的错误将延续到它下面的所有级别。因此，需要减少在层次结构早期的错误分类。术语歧义可能是导致分类错误的原因之一。Naïve贝叶斯分类方法主要用于文本分类问题，因为它需要较少的训练和测试时间。Naïve贝叶斯模型认为对于给定的类，项之间是不依赖的。对于项相互依赖的数据，naïve贝叶斯算法的性能会下降。本文将基于词级n图的多项式Naïve贝叶斯分类方法与层次分类相结合，减少了层次早期出现的误分类，改进了基于内容的过滤。该算法还提出了一种减少n-gram模型中计算项概率的执行时间的方法naïve bayes模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 3rd International Conference on Computing Methodologies and Communication (ICCMC)

自引率

0.00%

发文量