{"title":"The Data Selection Criteria for HSC and SVM Algorithms","authors":"Qing He, Fuzhen Zhuang, Zhongzhi Shi","doi":"10.1109/ICNC.2008.334","DOIUrl":null,"url":null,"abstract":"This paper makes a discussion of consistent subsets (CS) selection criteria for hyper surface Classification (HSC) and SVM algorithms. The consistent subsets play an important role in the data selection. Firstly, the paper proposes that minimal consistent subset for a disjoint cover set (MCSC) plays an important role in the data selection for HSC. The MCSC can be applied to select a representative subset from the original sample set for HSC. MCSC has the same classification model with the entire sample set and can totally reflect its classification ability. Secondly, the number of MCSC is calculated. Thirdly, by comparing the performance of HSC and SVM on corresponding CS, we argue that it is not reasonable that using the same train data set to train different classifiers and then testing the classifiers by the same test data set for different algorithms. The experiments show that algorithms can respectively select the proper data set for training, which ensures good performance and generalization ability. MCSC is the best selection for HSC, and support vector set is the effective selection for SVM.","PeriodicalId":6404,"journal":{"name":"2008 Fourth International Conference on Natural Computation","volume":"30 1","pages":"384-388"},"PeriodicalIF":0.0000,"publicationDate":"2008-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 Fourth International Conference on Natural Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICNC.2008.334","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
This paper makes a discussion of consistent subsets (CS) selection criteria for hyper surface Classification (HSC) and SVM algorithms. The consistent subsets play an important role in the data selection. Firstly, the paper proposes that minimal consistent subset for a disjoint cover set (MCSC) plays an important role in the data selection for HSC. The MCSC can be applied to select a representative subset from the original sample set for HSC. MCSC has the same classification model with the entire sample set and can totally reflect its classification ability. Secondly, the number of MCSC is calculated. Thirdly, by comparing the performance of HSC and SVM on corresponding CS, we argue that it is not reasonable that using the same train data set to train different classifiers and then testing the classifiers by the same test data set for different algorithms. The experiments show that algorithms can respectively select the proper data set for training, which ensures good performance and generalization ability. MCSC is the best selection for HSC, and support vector set is the effective selection for SVM.