{"title":"On the probability of feature selection in support vector classification","authors":"Qunfeng Liu, Lan Yao","doi":"10.1109/SOLI.2013.6611436","DOIUrl":null,"url":null,"abstract":"Feature selection is important for classification problem, especially when the number of features is very large or noisiness is present in data. Support vector machine (SVM) with Lp regularization is a popular approach for feature selection. Many researches have devoted to develop efficient methods to solve the optimization problem in support vector machine. However, to our knowledge, there is still no formal proof or comprehensive mathematical understanding on how Lp regularization can bring feature selection. In this paper, we first show that feature selection depends not only the parameter p but also the data itself. If the feasible region generated from the data lies faraway relatively from the coordinates, then feature selection maybe impossible for any p. Otherwise, a small p can help to enhance the ability of feature selection of Lp-SVM. Then we provide a formula for computing the probabilities which measure the feature selection ability. The only assumption is that the optimal solutions of all possible classification problems distribute uniformly on the contour of the objective function. Based on this formula, we compute the probabilities for some popular p.","PeriodicalId":147180,"journal":{"name":"Proceedings of 2013 IEEE International Conference on Service Operations and Logistics, and Informatics","volume":"113 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 2013 IEEE International Conference on Service Operations and Logistics, and Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SOLI.2013.6611436","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Feature selection is important for classification problem, especially when the number of features is very large or noisiness is present in data. Support vector machine (SVM) with Lp regularization is a popular approach for feature selection. Many researches have devoted to develop efficient methods to solve the optimization problem in support vector machine. However, to our knowledge, there is still no formal proof or comprehensive mathematical understanding on how Lp regularization can bring feature selection. In this paper, we first show that feature selection depends not only the parameter p but also the data itself. If the feasible region generated from the data lies faraway relatively from the coordinates, then feature selection maybe impossible for any p. Otherwise, a small p can help to enhance the ability of feature selection of Lp-SVM. Then we provide a formula for computing the probabilities which measure the feature selection ability. The only assumption is that the optimal solutions of all possible classification problems distribute uniformly on the contour of the objective function. Based on this formula, we compute the probabilities for some popular p.