{"title":"利用关联特征提高朴素贝叶斯文本分类器的性能","authors":"Zhang Yang, Z. Lijun, Jianfeng Yan, Zhanhuai Li","doi":"10.1109/ICCIMA.2003.1238148","DOIUrl":null,"url":null,"abstract":"The co-occurrence of words can make contributions to automatic text classification. However, this information cannot be represented in the feature set when only using primitive features, and can only be partially represented when using n-grams as features. In this paper, we define a novel feature, association feature, to describe this information. In order to make the association features which we selected to be good discriminators, we proposed an approach to create association feature set, including redundancy pruning algorithm and feature selection algorithm. The experiment result shows that the performance of Naive Bayes text classifier could be improved by using association features, which also means that the selected set of association features can make more contributions to text classification than primitive features, and n-grams.","PeriodicalId":385362,"journal":{"name":"Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Using association features to enhance the performance of Naive Bayes text classifier\",\"authors\":\"Zhang Yang, Z. Lijun, Jianfeng Yan, Zhanhuai Li\",\"doi\":\"10.1109/ICCIMA.2003.1238148\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The co-occurrence of words can make contributions to automatic text classification. However, this information cannot be represented in the feature set when only using primitive features, and can only be partially represented when using n-grams as features. In this paper, we define a novel feature, association feature, to describe this information. In order to make the association features which we selected to be good discriminators, we proposed an approach to create association feature set, including redundancy pruning algorithm and feature selection algorithm. The experiment result shows that the performance of Naive Bayes text classifier could be improved by using association features, which also means that the selected set of association features can make more contributions to text classification than primitive features, and n-grams.\",\"PeriodicalId\":385362,\"journal\":{\"name\":\"Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCIMA.2003.1238148\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIMA.2003.1238148","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using association features to enhance the performance of Naive Bayes text classifier
The co-occurrence of words can make contributions to automatic text classification. However, this information cannot be represented in the feature set when only using primitive features, and can only be partially represented when using n-grams as features. In this paper, we define a novel feature, association feature, to describe this information. In order to make the association features which we selected to be good discriminators, we proposed an approach to create association feature set, including redundancy pruning algorithm and feature selection algorithm. The experiment result shows that the performance of Naive Bayes text classifier could be improved by using association features, which also means that the selected set of association features can make more contributions to text classification than primitive features, and n-grams.