{"title":"基于N-Gram的Naïve贝叶斯分类器分层分类性能评价","authors":"J. Shah","doi":"10.1109/ICCMC.2019.8819751","DOIUrl":null,"url":null,"abstract":"Text classification is a process of allocating one or more class label to a text document. If the text classification problem has too many categories, and there are certain categories with less number of training documents, text classification task becomes difficult. Recall will be less for categories with less number of training documents. To handle text classification problem with too many categories and to take into consideration parent-child/sibling relationships between categories in user profile and document profile for content-based filtering, hierarchical classification is better approach. The main issue with hierarchical classification is error propagation. The error that occurs at early level in hierarchy will carry forward to all the levels below it. So, misclassification at early level in hierarchy needs to be reduced. Term ambiguity may be one of the reasons for classification error. Naïve Bayes classification method is mostly used in text classification problem as it takes less time for training and testing. Naïve Bayes model considers that terms are not dependent on each other for a given class. For data where terms are dependent on each other, performance of naïve Bayes is degraded. In this paper, word-level n-gram based Multinomial Naïve Bayes classification method is combined with hierarchical classification to reduce misclassification that occur at early level in hierarchy & improve content-based filtering. Proposed algorithm also suggests a way to reduce execution time requirements for calculating probabilities of terms for n-gram naïve bayes model.","PeriodicalId":232624,"journal":{"name":"2019 3rd International Conference on Computing Methodologies and Communication (ICCMC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Performance Evaluation of Applying N-Gram Based Naïve Bayes Classifier for Hierarchical Classification\",\"authors\":\"J. Shah\",\"doi\":\"10.1109/ICCMC.2019.8819751\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text classification is a process of allocating one or more class label to a text document. If the text classification problem has too many categories, and there are certain categories with less number of training documents, text classification task becomes difficult. Recall will be less for categories with less number of training documents. To handle text classification problem with too many categories and to take into consideration parent-child/sibling relationships between categories in user profile and document profile for content-based filtering, hierarchical classification is better approach. The main issue with hierarchical classification is error propagation. The error that occurs at early level in hierarchy will carry forward to all the levels below it. So, misclassification at early level in hierarchy needs to be reduced. Term ambiguity may be one of the reasons for classification error. Naïve Bayes classification method is mostly used in text classification problem as it takes less time for training and testing. Naïve Bayes model considers that terms are not dependent on each other for a given class. For data where terms are dependent on each other, performance of naïve Bayes is degraded. In this paper, word-level n-gram based Multinomial Naïve Bayes classification method is combined with hierarchical classification to reduce misclassification that occur at early level in hierarchy & improve content-based filtering. Proposed algorithm also suggests a way to reduce execution time requirements for calculating probabilities of terms for n-gram naïve bayes model.\",\"PeriodicalId\":232624,\"journal\":{\"name\":\"2019 3rd International Conference on Computing Methodologies and Communication (ICCMC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 3rd International Conference on Computing Methodologies and Communication (ICCMC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCMC.2019.8819751\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 3rd International Conference on Computing Methodologies and Communication (ICCMC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCMC.2019.8819751","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Performance Evaluation of Applying N-Gram Based Naïve Bayes Classifier for Hierarchical Classification
Text classification is a process of allocating one or more class label to a text document. If the text classification problem has too many categories, and there are certain categories with less number of training documents, text classification task becomes difficult. Recall will be less for categories with less number of training documents. To handle text classification problem with too many categories and to take into consideration parent-child/sibling relationships between categories in user profile and document profile for content-based filtering, hierarchical classification is better approach. The main issue with hierarchical classification is error propagation. The error that occurs at early level in hierarchy will carry forward to all the levels below it. So, misclassification at early level in hierarchy needs to be reduced. Term ambiguity may be one of the reasons for classification error. Naïve Bayes classification method is mostly used in text classification problem as it takes less time for training and testing. Naïve Bayes model considers that terms are not dependent on each other for a given class. For data where terms are dependent on each other, performance of naïve Bayes is degraded. In this paper, word-level n-gram based Multinomial Naïve Bayes classification method is combined with hierarchical classification to reduce misclassification that occur at early level in hierarchy & improve content-based filtering. Proposed algorithm also suggests a way to reduce execution time requirements for calculating probabilities of terms for n-gram naïve bayes model.