Jieming Yang, Zhiying Liu, Zhaoyang Qu, Junchang Wang
{"title":"Feature selection method based on crossed centroid for text categorization","authors":"Jieming Yang, Zhiying Liu, Zhaoyang Qu, Junchang Wang","doi":"10.1109/SNPD.2014.6888675","DOIUrl":null,"url":null,"abstract":"The most important characteristic of text categorization is the high dimensionality even for the moderate size dataset. Feature selection, which can reduce the size of the dimensionality without sacrificing the performance of the categorization and avoid over-fitting, is a commonly used approach in dimensionality reduction. In this paper, we proposed a new feature selection, which evaluates the deviation from the centroid based on both inter-category and intra-category. We compared the proposed method with four well-known feature selection algorithms using support vector machines on three benchmark datasets (20-newgroups, reuters-21578 and webkb). The experimental results show that the proposed method can significantly improve the performance of the classifier.","PeriodicalId":272932,"journal":{"name":"15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SNPD.2014.6888675","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
The most important characteristic of text categorization is the high dimensionality even for the moderate size dataset. Feature selection, which can reduce the size of the dimensionality without sacrificing the performance of the categorization and avoid over-fitting, is a commonly used approach in dimensionality reduction. In this paper, we proposed a new feature selection, which evaluates the deviation from the centroid based on both inter-category and intra-category. We compared the proposed method with four well-known feature selection algorithms using support vector machines on three benchmark datasets (20-newgroups, reuters-21578 and webkb). The experimental results show that the proposed method can significantly improve the performance of the classifier.