{"title":"文本分类中特征选择的重要准则研究","authors":"Yan Xu","doi":"10.1109/IWISA.2010.5473381","DOIUrl":null,"url":null,"abstract":"A major difficulty of text categorization is the high dimensionality of the feature space. Feature selection is an important step in text categorization to reduce the feature space. Empirical studies of text categorization show that good text categorization performance is related to some feature selection criteria, and when a criterion is not satisfied, it often indicates non-optimality of the method. According to our analysis, there are some reasons for good performance of feature selection in text categorization tasks: favoring common terms, using category information and using term frequency information), and so on. Automatic feature selection methods such as document frequency thresholding (DF), information gain (IG), mutual information (MI), and so on are commonly applied in text categorization, but none of them satisfies all the criteria above. In this paper, we present some Important criteria of FS in TC. Experimental results indicate that the empirical performance of a FS function is tightly related to how well it satisfies these criteria","PeriodicalId":298764,"journal":{"name":"2010 2nd International Workshop on Intelligent Systems and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A Study for Important Criteria of Feature Selection in Text Categorization\",\"authors\":\"Yan Xu\",\"doi\":\"10.1109/IWISA.2010.5473381\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A major difficulty of text categorization is the high dimensionality of the feature space. Feature selection is an important step in text categorization to reduce the feature space. Empirical studies of text categorization show that good text categorization performance is related to some feature selection criteria, and when a criterion is not satisfied, it often indicates non-optimality of the method. According to our analysis, there are some reasons for good performance of feature selection in text categorization tasks: favoring common terms, using category information and using term frequency information), and so on. Automatic feature selection methods such as document frequency thresholding (DF), information gain (IG), mutual information (MI), and so on are commonly applied in text categorization, but none of them satisfies all the criteria above. In this paper, we present some Important criteria of FS in TC. Experimental results indicate that the empirical performance of a FS function is tightly related to how well it satisfies these criteria\",\"PeriodicalId\":298764,\"journal\":{\"name\":\"2010 2nd International Workshop on Intelligent Systems and Applications\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-05-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 2nd International Workshop on Intelligent Systems and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IWISA.2010.5473381\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 2nd International Workshop on Intelligent Systems and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWISA.2010.5473381","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Study for Important Criteria of Feature Selection in Text Categorization
A major difficulty of text categorization is the high dimensionality of the feature space. Feature selection is an important step in text categorization to reduce the feature space. Empirical studies of text categorization show that good text categorization performance is related to some feature selection criteria, and when a criterion is not satisfied, it often indicates non-optimality of the method. According to our analysis, there are some reasons for good performance of feature selection in text categorization tasks: favoring common terms, using category information and using term frequency information), and so on. Automatic feature selection methods such as document frequency thresholding (DF), information gain (IG), mutual information (MI), and so on are commonly applied in text categorization, but none of them satisfies all the criteria above. In this paper, we present some Important criteria of FS in TC. Experimental results indicate that the empirical performance of a FS function is tightly related to how well it satisfies these criteria