Performance Evaluation of Applying N-Gram Based Naïve Bayes Classifier for Hierarchical Classification

J. Shah
{"title":"Performance Evaluation of Applying N-Gram Based Naïve Bayes Classifier for Hierarchical Classification","authors":"J. Shah","doi":"10.1109/ICCMC.2019.8819751","DOIUrl":null,"url":null,"abstract":"Text classification is a process of allocating one or more class label to a text document. If the text classification problem has too many categories, and there are certain categories with less number of training documents, text classification task becomes difficult. Recall will be less for categories with less number of training documents. To handle text classification problem with too many categories and to take into consideration parent-child/sibling relationships between categories in user profile and document profile for content-based filtering, hierarchical classification is better approach. The main issue with hierarchical classification is error propagation. The error that occurs at early level in hierarchy will carry forward to all the levels below it. So, misclassification at early level in hierarchy needs to be reduced. Term ambiguity may be one of the reasons for classification error. Naïve Bayes classification method is mostly used in text classification problem as it takes less time for training and testing. Naïve Bayes model considers that terms are not dependent on each other for a given class. For data where terms are dependent on each other, performance of naïve Bayes is degraded. In this paper, word-level n-gram based Multinomial Naïve Bayes classification method is combined with hierarchical classification to reduce misclassification that occur at early level in hierarchy & improve content-based filtering. Proposed algorithm also suggests a way to reduce execution time requirements for calculating probabilities of terms for n-gram naïve bayes model.","PeriodicalId":232624,"journal":{"name":"2019 3rd International Conference on Computing Methodologies and Communication (ICCMC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 3rd International Conference on Computing Methodologies and Communication (ICCMC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCMC.2019.8819751","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Text classification is a process of allocating one or more class label to a text document. If the text classification problem has too many categories, and there are certain categories with less number of training documents, text classification task becomes difficult. Recall will be less for categories with less number of training documents. To handle text classification problem with too many categories and to take into consideration parent-child/sibling relationships between categories in user profile and document profile for content-based filtering, hierarchical classification is better approach. The main issue with hierarchical classification is error propagation. The error that occurs at early level in hierarchy will carry forward to all the levels below it. So, misclassification at early level in hierarchy needs to be reduced. Term ambiguity may be one of the reasons for classification error. Naïve Bayes classification method is mostly used in text classification problem as it takes less time for training and testing. Naïve Bayes model considers that terms are not dependent on each other for a given class. For data where terms are dependent on each other, performance of naïve Bayes is degraded. In this paper, word-level n-gram based Multinomial Naïve Bayes classification method is combined with hierarchical classification to reduce misclassification that occur at early level in hierarchy & improve content-based filtering. Proposed algorithm also suggests a way to reduce execution time requirements for calculating probabilities of terms for n-gram naïve bayes model.
基于N-Gram的Naïve贝叶斯分类器分层分类性能评价
文本分类是为文本文档分配一个或多个类标签的过程。如果文本分类问题的类别过多,并且某些类别的训练文档数量较少,则文本分类任务变得困难。对于培训文档数量较少的类别,召回率会更低。为了处理类别过多的文本分类问题,并考虑用户概要和文档概要中类别之间的父子/兄弟关系进行基于内容的过滤,分层分类是较好的方法。分层分类的主要问题是错误传播。在层次结构的早期级别发生的错误将延续到它下面的所有级别。因此,需要减少在层次结构早期的错误分类。术语歧义可能是导致分类错误的原因之一。Naïve贝叶斯分类方法主要用于文本分类问题,因为它需要较少的训练和测试时间。Naïve贝叶斯模型认为对于给定的类,项之间是不依赖的。对于项相互依赖的数据,naïve贝叶斯算法的性能会下降。本文将基于词级n图的多项式Naïve贝叶斯分类方法与层次分类相结合,减少了层次早期出现的误分类,改进了基于内容的过滤。该算法还提出了一种减少n-gram模型中计算项概率的执行时间的方法naïve bayes模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信