{"title":"最大熵框架在文本分类中的应用","authors":"Hui Wang, Lin Wang, Lixia Yi","doi":"10.1109/ICICISYS.2010.5658639","DOIUrl":null,"url":null,"abstract":"In this paper, Maximum Entropy (ME) framework is used to classify text documents. The ME framework has a lot of advantages when compared with other supervised learning algorithms, such as naive Bayes classifier. For example, it makes no inherent conditional independence assumptions between terms. With four labeled data sets, extensive experiments are made to compare the accuracy of ME algorithm with those of naive Bayes and Support Vector Machine (SVM), which are two popular and accurate algorithms in the domain of text classification. The final result is that ME method consistently outperforms naive Bayes and SVM algorithms in accuracy. On the WebKB and Industry Vector data sets, the accuracy of ME algorithm increases from 81.38% to 85.52% and from 85.73% to 89.78% respectively. On the third 20 Newsgroups data set, our experimental result is opposite to that of Nigam et al. For the last Reuters-21578 data set, the accuracy of ME algorithm increases from 94.76% to 96.16%.","PeriodicalId":339711,"journal":{"name":"2010 IEEE International Conference on Intelligent Computing and Intelligent Systems","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Maximum Entropy framework used in text classification\",\"authors\":\"Hui Wang, Lin Wang, Lixia Yi\",\"doi\":\"10.1109/ICICISYS.2010.5658639\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, Maximum Entropy (ME) framework is used to classify text documents. The ME framework has a lot of advantages when compared with other supervised learning algorithms, such as naive Bayes classifier. For example, it makes no inherent conditional independence assumptions between terms. With four labeled data sets, extensive experiments are made to compare the accuracy of ME algorithm with those of naive Bayes and Support Vector Machine (SVM), which are two popular and accurate algorithms in the domain of text classification. The final result is that ME method consistently outperforms naive Bayes and SVM algorithms in accuracy. On the WebKB and Industry Vector data sets, the accuracy of ME algorithm increases from 81.38% to 85.52% and from 85.73% to 89.78% respectively. On the third 20 Newsgroups data set, our experimental result is opposite to that of Nigam et al. For the last Reuters-21578 data set, the accuracy of ME algorithm increases from 94.76% to 96.16%.\",\"PeriodicalId\":339711,\"journal\":{\"name\":\"2010 IEEE International Conference on Intelligent Computing and Intelligent Systems\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Conference on Intelligent Computing and Intelligent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICISYS.2010.5658639\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Intelligent Computing and Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICISYS.2010.5658639","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Maximum Entropy framework used in text classification
In this paper, Maximum Entropy (ME) framework is used to classify text documents. The ME framework has a lot of advantages when compared with other supervised learning algorithms, such as naive Bayes classifier. For example, it makes no inherent conditional independence assumptions between terms. With four labeled data sets, extensive experiments are made to compare the accuracy of ME algorithm with those of naive Bayes and Support Vector Machine (SVM), which are two popular and accurate algorithms in the domain of text classification. The final result is that ME method consistently outperforms naive Bayes and SVM algorithms in accuracy. On the WebKB and Industry Vector data sets, the accuracy of ME algorithm increases from 81.38% to 85.52% and from 85.73% to 89.78% respectively. On the third 20 Newsgroups data set, our experimental result is opposite to that of Nigam et al. For the last Reuters-21578 data set, the accuracy of ME algorithm increases from 94.76% to 96.16%.