特征导入向量机:一种具有灵活特征选择的通用分类器。

IF 3.6 4区数学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Statistical Analysis and Data Mining Pub Date : 2015-02-01 Epub Date: 2015-01-26 DOI:10.1002/sam.11259

Samiran Ghosh, Yazhen Wang

{"title":"特征导入向量机:一种具有灵活特征选择的通用分类器。","authors":"Samiran Ghosh, Yazhen Wang","doi":"10.1002/sam.11259","DOIUrl":null,"url":null,"abstract":"The support vector machine (SVM) and other reproducing kernel Hilbert space (RKHS) based classifier systems are drawing much attention recently due to its robustness and generalization capability. General theme here is to construct classifiers based on the training data in a high dimensional space by using all available dimensions. The SVM achieves huge data compression by selecting only few observations which lie close to the boundary of the classifier function. However when the number of observations are not very large (small n) but the number of dimensions/features are large (large p), then it is not necessary that all available features are of equal importance in the classification context. Possible selection of an useful fraction of the available features may result in huge data compression. In this paper we propose an algorithmic approach by means of which such an optimal set of features could be selected. In short, we reverse the traditional sequential observation selection strategy of SVM to that of sequential feature selection. To achieve this we have modified the solution proposed by Zhu and Hastie (2005) in the context of import vector machine (IVM), to select an optimal sub-dimensional model to build the final classifier with sufficient accuracy.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"8 1","pages":"49-63"},"PeriodicalIF":3.6000,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/sam.11259","citationCount":"3","resultStr":"{\"title\":\"Feature Import Vector Machine: A General Classifier with Flexible Feature Selection.\",\"authors\":\"Samiran Ghosh, Yazhen Wang\",\"doi\":\"10.1002/sam.11259\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The support vector machine (SVM) and other reproducing kernel Hilbert space (RKHS) based classifier systems are drawing much attention recently due to its robustness and generalization capability. General theme here is to construct classifiers based on the training data in a high dimensional space by using all available dimensions. The SVM achieves huge data compression by selecting only few observations which lie close to the boundary of the classifier function. However when the number of observations are not very large (small n) but the number of dimensions/features are large (large p), then it is not necessary that all available features are of equal importance in the classification context. Possible selection of an useful fraction of the available features may result in huge data compression. In this paper we propose an algorithmic approach by means of which such an optimal set of features could be selected. In short, we reverse the traditional sequential observation selection strategy of SVM to that of sequential feature selection. To achieve this we have modified the solution proposed by Zhu and Hastie (2005) in the context of import vector machine (IVM), to select an optimal sub-dimensional model to build the final classifier with sufficient accuracy.\",\"PeriodicalId\":48684,\"journal\":{\"name\":\"Statistical Analysis and Data Mining\",\"volume\":\"8 1\",\"pages\":\"49-63\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2015-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1002/sam.11259\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistical Analysis and Data Mining\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1002/sam.11259\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2015/1/26 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1002/sam.11259","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2015/1/26 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 3

摘要

支持向量机(SVM)和其他基于再现核希尔伯特空间(RKHS)的分类器系统由于其鲁棒性和泛化能力近年来备受关注。这里的一般主题是利用所有可用的维度，在高维空间中基于训练数据构建分类器。支持向量机通过选择靠近分类器函数边界的少量观测值来实现巨大的数据压缩。然而，当观测值的数量不是很大(小n)，但维度/特征的数量很大(大p)时，则不一定所有可用的特征在分类上下文中都同等重要。可能选择可用特征的有用部分可能导致巨大的数据压缩。在本文中，我们提出了一种算法方法，通过这种方法可以选择这样一个最优的特征集。简而言之，我们将传统的支持向量机序列观测选择策略逆转为序列特征选择策略。为了实现这一点，我们修改了Zhu和Hastie(2005)在导入向量机(IVM)背景下提出的解决方案，以选择最优的子维度模型来构建具有足够精度的最终分类器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Feature Import Vector Machine: A General Classifier with Flexible Feature Selection.

查看原文本刊更多论文

Feature Import Vector Machine: A General Classifier with Flexible Feature Selection.

The support vector machine (SVM) and other reproducing kernel Hilbert space (RKHS) based classifier systems are drawing much attention recently due to its robustness and generalization capability. General theme here is to construct classifiers based on the training data in a high dimensional space by using all available dimensions. The SVM achieves huge data compression by selecting only few observations which lie close to the boundary of the classifier function. However when the number of observations are not very large (small n) but the number of dimensions/features are large (large p), then it is not necessary that all available features are of equal importance in the classification context. Possible selection of an useful fraction of the available features may result in huge data compression. In this paper we propose an algorithmic approach by means of which such an optimal set of features could be selected. In short, we reverse the traditional sequential observation selection strategy of SVM to that of sequential feature selection. To achieve this we have modified the solution proposed by Zhu and Hastie (2005) in the context of import vector machine (IVM), to select an optimal sub-dimensional model to build the final classifier with sufficient accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Statistical Analysis and Data Mining COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

CiteScore

3.20

自引率

7.70%

发文量

期刊介绍： Statistical Analysis and Data Mining addresses the broad area of data analysis, including statistical approaches, machine learning, data mining, and applications. Topics include statistical and computational approaches for analyzing massive and complex datasets, novel statistical and/or machine learning methods and theory, and state-of-the-art applications with high impact. Of special interest are articles that describe innovative analytical techniques, and discuss their application to real problems, in such a way that they are accessible and beneficial to domain experts across science, engineering, and commerce. The focus of the journal is on papers which satisfy one or more of the following criteria: Solve data analysis problems associated with massive, complex datasets Develop innovative statistical approaches, machine learning algorithms, or methods integrating ideas across disciplines, e.g., statistics, computer science, electrical engineering, operation research. Formulate and solve high-impact real-world problems which challenge existing paradigms via new statistical and/or computational models Provide survey to prominent research topics.