{"title":"基于改进粒子群优化的中文文本分类特征选择","authors":"Yaohong Jin, Wen Xiong, Cong Wang","doi":"10.1109/NLPKE.2010.5587844","DOIUrl":null,"url":null,"abstract":"Feature selection is an important preprocessing step of Chinese Text Categorization, which reduces the high dimension and keeps the reduced results comprehensible compared to feature extraction. A novel criterion to filter features coarsely is proposed, which integrating the superiorities of term frequency-inverse document frequency as inner-class measure and CHI-square as inter-class, and a new feature selection method for Chinese text categorization based on swarm intelligence is presented, which using improved particle swarm optimization to select features fine on the results of coarse grain filtering, and utilizing support vector machine to evaluate feature subsets and taking the evaluations as the fitness of particles. The experiments on Fudan University Chinese Text Classification Corpus show a higher classification accuracy obtained by using the new criterion for features filtering and an effective feature reduction ratio attained by utilizing the novel FS method for Chinese text categorization.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Feature selection for Chinese Text Categorization based on improved particle swarm optimization\",\"authors\":\"Yaohong Jin, Wen Xiong, Cong Wang\",\"doi\":\"10.1109/NLPKE.2010.5587844\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Feature selection is an important preprocessing step of Chinese Text Categorization, which reduces the high dimension and keeps the reduced results comprehensible compared to feature extraction. A novel criterion to filter features coarsely is proposed, which integrating the superiorities of term frequency-inverse document frequency as inner-class measure and CHI-square as inter-class, and a new feature selection method for Chinese text categorization based on swarm intelligence is presented, which using improved particle swarm optimization to select features fine on the results of coarse grain filtering, and utilizing support vector machine to evaluate feature subsets and taking the evaluations as the fitness of particles. The experiments on Fudan University Chinese Text Classification Corpus show a higher classification accuracy obtained by using the new criterion for features filtering and an effective feature reduction ratio attained by utilizing the novel FS method for Chinese text categorization.\",\"PeriodicalId\":259975,\"journal\":{\"name\":\"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NLPKE.2010.5587844\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NLPKE.2010.5587844","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Feature selection for Chinese Text Categorization based on improved particle swarm optimization
Feature selection is an important preprocessing step of Chinese Text Categorization, which reduces the high dimension and keeps the reduced results comprehensible compared to feature extraction. A novel criterion to filter features coarsely is proposed, which integrating the superiorities of term frequency-inverse document frequency as inner-class measure and CHI-square as inter-class, and a new feature selection method for Chinese text categorization based on swarm intelligence is presented, which using improved particle swarm optimization to select features fine on the results of coarse grain filtering, and utilizing support vector machine to evaluate feature subsets and taking the evaluations as the fitness of particles. The experiments on Fudan University Chinese Text Classification Corpus show a higher classification accuracy obtained by using the new criterion for features filtering and an effective feature reduction ratio attained by utilizing the novel FS method for Chinese text categorization.