Kazuhiro Omura, Mineichi Kudo, Tomomi Endo, T. Murai
{"title":"分类特征的加权naïve贝叶斯分类器","authors":"Kazuhiro Omura, Mineichi Kudo, Tomomi Endo, T. Murai","doi":"10.1109/ISDA.2012.6416651","DOIUrl":null,"url":null,"abstract":"Recently we face classification problems with many categorical features, as seen in genetic data and text data. In this paper, we discuss some ways to give weights on features in the framework of naïve Bayes classifier, that is, under independent assumption of features. Because no order exists in a categorical feature, we consider a histogram over possible values (bins) in the feature. Taking into the difference of number of samples falling in each bin, we propose two kinds of weights: 1) one is derived from the probability that the majority class takes the majority even in samples, and 2) another reflects the expected conditional entropy. With the latter entropy weight, it will be shown that more discriminative features gain higher weights and non-discriminative feature diminishes as the number of samples goes infinity. We reveal the properties of these two kinds of weights through artificial data and some real-life data.","PeriodicalId":370150,"journal":{"name":"2012 12th International Conference on Intelligent Systems Design and Applications (ISDA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Weighted naïve Bayes classifier on categorical features\",\"authors\":\"Kazuhiro Omura, Mineichi Kudo, Tomomi Endo, T. Murai\",\"doi\":\"10.1109/ISDA.2012.6416651\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently we face classification problems with many categorical features, as seen in genetic data and text data. In this paper, we discuss some ways to give weights on features in the framework of naïve Bayes classifier, that is, under independent assumption of features. Because no order exists in a categorical feature, we consider a histogram over possible values (bins) in the feature. Taking into the difference of number of samples falling in each bin, we propose two kinds of weights: 1) one is derived from the probability that the majority class takes the majority even in samples, and 2) another reflects the expected conditional entropy. With the latter entropy weight, it will be shown that more discriminative features gain higher weights and non-discriminative feature diminishes as the number of samples goes infinity. We reveal the properties of these two kinds of weights through artificial data and some real-life data.\",\"PeriodicalId\":370150,\"journal\":{\"name\":\"2012 12th International Conference on Intelligent Systems Design and Applications (ISDA)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 12th International Conference on Intelligent Systems Design and Applications (ISDA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISDA.2012.6416651\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 12th International Conference on Intelligent Systems Design and Applications (ISDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISDA.2012.6416651","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Weighted naïve Bayes classifier on categorical features
Recently we face classification problems with many categorical features, as seen in genetic data and text data. In this paper, we discuss some ways to give weights on features in the framework of naïve Bayes classifier, that is, under independent assumption of features. Because no order exists in a categorical feature, we consider a histogram over possible values (bins) in the feature. Taking into the difference of number of samples falling in each bin, we propose two kinds of weights: 1) one is derived from the probability that the majority class takes the majority even in samples, and 2) another reflects the expected conditional entropy. With the latter entropy weight, it will be shown that more discriminative features gain higher weights and non-discriminative feature diminishes as the number of samples goes infinity. We reveal the properties of these two kinds of weights through artificial data and some real-life data.