Guohua Wu, Liuyang Wang, Nailiang Zhao, Hairong Lin
{"title":"Improved Expected Cross Entropy Method for Text Feature Selection","authors":"Guohua Wu, Liuyang Wang, Nailiang Zhao, Hairong Lin","doi":"10.1109/CSMA.2015.17","DOIUrl":null,"url":null,"abstract":"Feature selection plays an important role in text categorization, and contributes directly to the accuracy of the categorization. In the process of feature selection, due to the lack of consideration of the traditional expected cross entropy algorithm for document frequency, we first improve the expected cross entropy formula of the traditional, and then propose an improved text feature selection based on the text word frequency information. The method is modified by the expected cross entropy algorithm in three aspects of the frequency of features within category, the frequency distribution within category and the frequency distribution among different categories. The result of text categorization show that improved expected cross entropy feature selection approach has a more excellent effect in text categorization.","PeriodicalId":205396,"journal":{"name":"2015 International Conference on Computer Science and Mechanical Automation (CSMA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Computer Science and Mechanical Automation (CSMA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSMA.2015.17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
Feature selection plays an important role in text categorization, and contributes directly to the accuracy of the categorization. In the process of feature selection, due to the lack of consideration of the traditional expected cross entropy algorithm for document frequency, we first improve the expected cross entropy formula of the traditional, and then propose an improved text feature selection based on the text word frequency information. The method is modified by the expected cross entropy algorithm in three aspects of the frequency of features within category, the frequency distribution within category and the frequency distribution among different categories. The result of text categorization show that improved expected cross entropy feature selection approach has a more excellent effect in text categorization.