Guohua Wu, Liuyang Wang, Nailiang Zhao, Hairong Lin
{"title":"改进的期望交叉熵文本特征选择方法","authors":"Guohua Wu, Liuyang Wang, Nailiang Zhao, Hairong Lin","doi":"10.1109/CSMA.2015.17","DOIUrl":null,"url":null,"abstract":"Feature selection plays an important role in text categorization, and contributes directly to the accuracy of the categorization. In the process of feature selection, due to the lack of consideration of the traditional expected cross entropy algorithm for document frequency, we first improve the expected cross entropy formula of the traditional, and then propose an improved text feature selection based on the text word frequency information. The method is modified by the expected cross entropy algorithm in three aspects of the frequency of features within category, the frequency distribution within category and the frequency distribution among different categories. The result of text categorization show that improved expected cross entropy feature selection approach has a more excellent effect in text categorization.","PeriodicalId":205396,"journal":{"name":"2015 International Conference on Computer Science and Mechanical Automation (CSMA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Improved Expected Cross Entropy Method for Text Feature Selection\",\"authors\":\"Guohua Wu, Liuyang Wang, Nailiang Zhao, Hairong Lin\",\"doi\":\"10.1109/CSMA.2015.17\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Feature selection plays an important role in text categorization, and contributes directly to the accuracy of the categorization. In the process of feature selection, due to the lack of consideration of the traditional expected cross entropy algorithm for document frequency, we first improve the expected cross entropy formula of the traditional, and then propose an improved text feature selection based on the text word frequency information. The method is modified by the expected cross entropy algorithm in three aspects of the frequency of features within category, the frequency distribution within category and the frequency distribution among different categories. The result of text categorization show that improved expected cross entropy feature selection approach has a more excellent effect in text categorization.\",\"PeriodicalId\":205396,\"journal\":{\"name\":\"2015 International Conference on Computer Science and Mechanical Automation (CSMA)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Computer Science and Mechanical Automation (CSMA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSMA.2015.17\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Computer Science and Mechanical Automation (CSMA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSMA.2015.17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improved Expected Cross Entropy Method for Text Feature Selection
Feature selection plays an important role in text categorization, and contributes directly to the accuracy of the categorization. In the process of feature selection, due to the lack of consideration of the traditional expected cross entropy algorithm for document frequency, we first improve the expected cross entropy formula of the traditional, and then propose an improved text feature selection based on the text word frequency information. The method is modified by the expected cross entropy algorithm in three aspects of the frequency of features within category, the frequency distribution within category and the frequency distribution among different categories. The result of text categorization show that improved expected cross entropy feature selection approach has a more excellent effect in text categorization.