{"title":"用于高维数据分类的朴素贝叶斯特征自动加权","authors":"Lifei Chen, Shengrui Wang","doi":"10.1145/2396761.2398426","DOIUrl":null,"url":null,"abstract":"Naive Bayes (NB for short) is one of the popular methods for supervised classification in a knowledge management system. Currently, in many real-world applications, high-dimensional data pose a major challenge to conventional NB classifiers, due to noisy or redundant features and local relevance of these features to classes. In this paper, an automated feature weighting solution is proposed to result in a NB method effective in dealing with high-dimensional data. We first propose a locally weighted probability model, for Bayesian modeling in high-dimensional spaces, to implement a soft feature selection scheme. Then we propose an optimization algorithm to find the weights in linear time complexity, based on the Logitnormal priori distribution and the Maximum a Posteriori principle. Experimental studies show the effectiveness and suitability of the proposed model for high-dimensional data classification.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"Automated feature weighting in naive bayes for high-dimensional data classification\",\"authors\":\"Lifei Chen, Shengrui Wang\",\"doi\":\"10.1145/2396761.2398426\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Naive Bayes (NB for short) is one of the popular methods for supervised classification in a knowledge management system. Currently, in many real-world applications, high-dimensional data pose a major challenge to conventional NB classifiers, due to noisy or redundant features and local relevance of these features to classes. In this paper, an automated feature weighting solution is proposed to result in a NB method effective in dealing with high-dimensional data. We first propose a locally weighted probability model, for Bayesian modeling in high-dimensional spaces, to implement a soft feature selection scheme. Then we propose an optimization algorithm to find the weights in linear time complexity, based on the Logitnormal priori distribution and the Maximum a Posteriori principle. Experimental studies show the effectiveness and suitability of the proposed model for high-dimensional data classification.\",\"PeriodicalId\":313414,\"journal\":{\"name\":\"Proceedings of the 21st ACM international conference on Information and knowledge management\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-10-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 21st ACM international conference on Information and knowledge management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2396761.2398426\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st ACM international conference on Information and knowledge management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2396761.2398426","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automated feature weighting in naive bayes for high-dimensional data classification
Naive Bayes (NB for short) is one of the popular methods for supervised classification in a knowledge management system. Currently, in many real-world applications, high-dimensional data pose a major challenge to conventional NB classifiers, due to noisy or redundant features and local relevance of these features to classes. In this paper, an automated feature weighting solution is proposed to result in a NB method effective in dealing with high-dimensional data. We first propose a locally weighted probability model, for Bayesian modeling in high-dimensional spaces, to implement a soft feature selection scheme. Then we propose an optimization algorithm to find the weights in linear time complexity, based on the Logitnormal priori distribution and the Maximum a Posteriori principle. Experimental studies show the effectiveness and suitability of the proposed model for high-dimensional data classification.