{"title":"A modified Support Vector Clustering method for document categorization","authors":"B. Harish, M. Revanasiddappa, S. A. Aruna Kumar","doi":"10.1109/ICKEA.2016.7802982","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a novel text categorization method based on modified Support Vector Clustering (SVC). SVC is a density based clustering approach, which handles the arbitrary shape clusters effectively. The main drawback of traditional SVC is that it treats unclassified documents as outliers. To overcome this problem, we employed Fuzzy C-Means (FCM) to cluster unclassified documents. The modified SVC (SVC-FCM) is applied to categorize text documents. The proposed method consists of three steps: In the first step, Regularized Locality Preserving Indexing (RLPI) is applied on Term Document Matrix (TDM) to reduce dimensionality of features. In second step, we use SVC to find base-cluster centers of documents. Finally, we use FCM to cluster unclassified documents. To evaluate the performance of the proposed method, we conducted experiments on standard 20-NewsGroup dataset.","PeriodicalId":241850,"journal":{"name":"2016 IEEE International Conference on Knowledge Engineering and Applications (ICKEA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Knowledge Engineering and Applications (ICKEA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICKEA.2016.7802982","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
In this paper, we propose a novel text categorization method based on modified Support Vector Clustering (SVC). SVC is a density based clustering approach, which handles the arbitrary shape clusters effectively. The main drawback of traditional SVC is that it treats unclassified documents as outliers. To overcome this problem, we employed Fuzzy C-Means (FCM) to cluster unclassified documents. The modified SVC (SVC-FCM) is applied to categorize text documents. The proposed method consists of three steps: In the first step, Regularized Locality Preserving Indexing (RLPI) is applied on Term Document Matrix (TDM) to reduce dimensionality of features. In second step, we use SVC to find base-cluster centers of documents. Finally, we use FCM to cluster unclassified documents. To evaluate the performance of the proposed method, we conducted experiments on standard 20-NewsGroup dataset.