{"title":"基于自举平均的词簇构建朴素贝叶斯文档分类器","authors":"Yuanzhe Wang, Qiang Zhang, Liyuan Bai","doi":"10.1109/ITIME.2009.5236431","DOIUrl":null,"url":null,"abstract":"Aimed to solve the problem of low classification accuracy caused by poor distribution estimation by training naive bayes document classfier on word clusters, we build a sequential word list based on mutual information between words and their semantic cluster labels, then construct a sample set of the same size with the word list through bootstrap sampling and use the average of the corresponding parameters estimated from the sample set as the last parameter to classify unknown documents. Experiment results on benchmark document data sets show that the proposed strategy gains higher classification accuracy comparing to naive bayes documents classifier on word clusters or on words.","PeriodicalId":398477,"journal":{"name":"2009 IEEE International Symposium on IT in Medicine & Education","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Building naive bayes document classifier using word clusters based on bootstrap averaging\",\"authors\":\"Yuanzhe Wang, Qiang Zhang, Liyuan Bai\",\"doi\":\"10.1109/ITIME.2009.5236431\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Aimed to solve the problem of low classification accuracy caused by poor distribution estimation by training naive bayes document classfier on word clusters, we build a sequential word list based on mutual information between words and their semantic cluster labels, then construct a sample set of the same size with the word list through bootstrap sampling and use the average of the corresponding parameters estimated from the sample set as the last parameter to classify unknown documents. Experiment results on benchmark document data sets show that the proposed strategy gains higher classification accuracy comparing to naive bayes documents classifier on word clusters or on words.\",\"PeriodicalId\":398477,\"journal\":{\"name\":\"2009 IEEE International Symposium on IT in Medicine & Education\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE International Symposium on IT in Medicine & Education\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITIME.2009.5236431\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Symposium on IT in Medicine & Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITIME.2009.5236431","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Building naive bayes document classifier using word clusters based on bootstrap averaging
Aimed to solve the problem of low classification accuracy caused by poor distribution estimation by training naive bayes document classfier on word clusters, we build a sequential word list based on mutual information between words and their semantic cluster labels, then construct a sample set of the same size with the word list through bootstrap sampling and use the average of the corresponding parameters estimated from the sample set as the last parameter to classify unknown documents. Experiment results on benchmark document data sets show that the proposed strategy gains higher classification accuracy comparing to naive bayes documents classifier on word clusters or on words.