{"title":"A hybrid feature selection model for text clustering","authors":"A. Alsaeedi, M. A. Fattah, Khalid S. Aloufi","doi":"10.1109/ICSENGT.2017.8123411","DOIUrl":null,"url":null,"abstract":"For text clustering task, distinctive text features selection is important due to feature space high dimensionality. It is essential to reduce the feature space dimension to increase accuracy and decrease processing time. In this work, for text clustering task, we introduce a novel hybrid feature selection model. This method measures the term importance based on the correlation coefficient among four term weighting techniques. All terms in the feature parameter vector are ranked based on this correlation coefficient score. Then low score terms are filtered out. Clustering technique is applied on the feature parameter vectors after filtering step. The proposed method results show its superiority over the traditional feature selection approaches.","PeriodicalId":350572,"journal":{"name":"2017 7th IEEE International Conference on System Engineering and Technology (ICSET)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 7th IEEE International Conference on System Engineering and Technology (ICSET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSENGT.2017.8123411","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
For text clustering task, distinctive text features selection is important due to feature space high dimensionality. It is essential to reduce the feature space dimension to increase accuracy and decrease processing time. In this work, for text clustering task, we introduce a novel hybrid feature selection model. This method measures the term importance based on the correlation coefficient among four term weighting techniques. All terms in the feature parameter vector are ranked based on this correlation coefficient score. Then low score terms are filtered out. Clustering technique is applied on the feature parameter vectors after filtering step. The proposed method results show its superiority over the traditional feature selection approaches.