Multi-Class L2,1-Norm Support Vector Machine

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI:10.1109/ICDM.2011.105

Xiao Cai, F. Nie, Heng Huang, C. Ding

{"title":"Multi-Class L2,1-Norm Support Vector Machine","authors":"Xiao Cai, F. Nie, Heng Huang, C. Ding","doi":"10.1109/ICDM.2011.105","DOIUrl":null,"url":null,"abstract":"Feature selection is an essential component of data mining. In many data analysis tasks where the number of data point is much less than the number of features, efficient feature selection approaches are desired to extract meaningful features and to eliminate redundant ones. In the previous study, many data mining techniques have been applied to tackle the above challenging problem. In this paper, we propose a new $\\ell_{2,1}$-norm SVM, that is, multi-class hinge loss with a structured regularization term for all the classes to naturally select features for multi-class without bothering further heuristic strategy. Rather than directly solving the multi-class hinge loss with $\\ell_{2,1}$-norm regularization minimization, which has not been solved before due to its optimization difficulty, we are the first to give an efficient algorithm bridging the new problem with a previous solvable optimization problem to do multi-class feature selection. A global convergence proof for our method is also presented. Via the proposed efficient algorithm, we select features across multiple classes with jointly sparsity, \\emph{i.e.}, each feature has either small or large score over all classes. Comprehensive experiments have been performed on six bioinformatics data sets to show that our method can obtain better or competitive performance compared with exiting state-of-art multi-class feature selection approaches.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"57","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 11th International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2011.105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 57

Abstract

Feature selection is an essential component of data mining. In many data analysis tasks where the number of data point is much less than the number of features, efficient feature selection approaches are desired to extract meaningful features and to eliminate redundant ones. In the previous study, many data mining techniques have been applied to tackle the above challenging problem. In this paper, we propose a new $\ell_{2,1}$-norm SVM, that is, multi-class hinge loss with a structured regularization term for all the classes to naturally select features for multi-class without bothering further heuristic strategy. Rather than directly solving the multi-class hinge loss with $\ell_{2,1}$-norm regularization minimization, which has not been solved before due to its optimization difficulty, we are the first to give an efficient algorithm bridging the new problem with a previous solvable optimization problem to do multi-class feature selection. A global convergence proof for our method is also presented. Via the proposed efficient algorithm, we select features across multiple classes with jointly sparsity, \emph{i.e.}, each feature has either small or large score over all classes. Comprehensive experiments have been performed on six bioinformatics data sets to show that our method can obtain better or competitive performance compared with exiting state-of-art multi-class feature selection approaches.

查看原文本刊更多论文

多类L2,1范数支持向量机

特征选择是数据挖掘的重要组成部分。在许多数据点数量远远少于特征数量的数据分析任务中，需要有效的特征选择方法来提取有意义的特征并消除冗余的特征。在以往的研究中，许多数据挖掘技术已经被应用于解决上述具有挑战性的问题。在本文中，我们提出了一种新的$\ell_{2,1}$ -范数支持向量机，即多类铰链损失与所有类的结构化正则化项，以自然地为多类选择特征，而无需进一步的启发式策略。本文首次提出了一种有效的算法，将新问题与先前可解的优化问题连接起来，进行多类特征选择，而不是直接用$\ell_{2,1}$ -范数正则化最小化方法求解多类铰链损失问题。最后给出了该方法的全局收敛性证明。通过提出的高效算法，我们选择具有联合稀疏性的多个类的特征，\emph{即}每个特征在所有类上的得分或小或大。在六个生物信息学数据集上进行的综合实验表明，与现有的多类特征选择方法相比，我们的方法可以获得更好的或具有竞争力的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE 11th International Conference on Data Mining

自引率

0.00%

发文量