{"title":"精确贝叶斯分类器:结果总结","authors":"Amin Vahedian, Xun Zhou","doi":"10.1109/ICDM51629.2021.00076","DOIUrl":null,"url":null,"abstract":"The Bayes Classifier is shown to have the minimal classification error, in addition to interpretable predictions. However, it requires the knowledge of underlying distributions of the predictors to be usable. This requirement is almost never satisfied. Naive Bayes classifiers and variants estimate this classifier by assuming the independence among predictors. This restrictive assumption hinders both the accuracy of these classifiers and their interpretability, as the calculated probabilities become less reliable. Moreover, it is argued in the literature that interpretability comes at the expense of accuracy and vice versa. In this paper, we are motivated by the accurate and interpretable nature of the Bayes Classifier. We propose Precise Bayes, which is a computationally efficient estimation of the Bayes Classifier based on a new formulation. Our method makes no assumptions, neither on independence nor on underlying distributions. We devise a new theoretical minimal error rate for our formulation and show that the error rate of Precise Bayes approaches this limit with increasing number of samples learned. Moreover, the calculated posterior probabilities, are actual empirical probabilities calculated by counting the observations and outcomes. This makes the predictions made by Precise Bayes fully explainable. Our evaluations on generated datasets and real datasets validate our theoretical claims on prediction error rate and computational efficiency.","PeriodicalId":320970,"journal":{"name":"2021 IEEE International Conference on Data Mining (ICDM)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Precise Bayes Classifier: Summary of Results\",\"authors\":\"Amin Vahedian, Xun Zhou\",\"doi\":\"10.1109/ICDM51629.2021.00076\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Bayes Classifier is shown to have the minimal classification error, in addition to interpretable predictions. However, it requires the knowledge of underlying distributions of the predictors to be usable. This requirement is almost never satisfied. Naive Bayes classifiers and variants estimate this classifier by assuming the independence among predictors. This restrictive assumption hinders both the accuracy of these classifiers and their interpretability, as the calculated probabilities become less reliable. Moreover, it is argued in the literature that interpretability comes at the expense of accuracy and vice versa. In this paper, we are motivated by the accurate and interpretable nature of the Bayes Classifier. We propose Precise Bayes, which is a computationally efficient estimation of the Bayes Classifier based on a new formulation. Our method makes no assumptions, neither on independence nor on underlying distributions. We devise a new theoretical minimal error rate for our formulation and show that the error rate of Precise Bayes approaches this limit with increasing number of samples learned. Moreover, the calculated posterior probabilities, are actual empirical probabilities calculated by counting the observations and outcomes. This makes the predictions made by Precise Bayes fully explainable. Our evaluations on generated datasets and real datasets validate our theoretical claims on prediction error rate and computational efficiency.\",\"PeriodicalId\":320970,\"journal\":{\"name\":\"2021 IEEE International Conference on Data Mining (ICDM)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Data Mining (ICDM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM51629.2021.00076\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM51629.2021.00076","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The Bayes Classifier is shown to have the minimal classification error, in addition to interpretable predictions. However, it requires the knowledge of underlying distributions of the predictors to be usable. This requirement is almost never satisfied. Naive Bayes classifiers and variants estimate this classifier by assuming the independence among predictors. This restrictive assumption hinders both the accuracy of these classifiers and their interpretability, as the calculated probabilities become less reliable. Moreover, it is argued in the literature that interpretability comes at the expense of accuracy and vice versa. In this paper, we are motivated by the accurate and interpretable nature of the Bayes Classifier. We propose Precise Bayes, which is a computationally efficient estimation of the Bayes Classifier based on a new formulation. Our method makes no assumptions, neither on independence nor on underlying distributions. We devise a new theoretical minimal error rate for our formulation and show that the error rate of Precise Bayes approaches this limit with increasing number of samples learned. Moreover, the calculated posterior probabilities, are actual empirical probabilities calculated by counting the observations and outcomes. This makes the predictions made by Precise Bayes fully explainable. Our evaluations on generated datasets and real datasets validate our theoretical claims on prediction error rate and computational efficiency.