稳定特征选择的方差缩减框架

Yue Han, Lei Yu
{"title":"稳定特征选择的方差缩减框架","authors":"Yue Han, Lei Yu","doi":"10.1002/sam.11152","DOIUrl":null,"url":null,"abstract":"Besides high accuracy, stability of feature selection has recently attracted strong interest in knowledge discovery from high-dimensional data. In this study, we present a theoretical framework about the relationship between the stability and accuracy of feature selection based on a formal bias-variance decomposition of feature selection error. The framework also suggests a variance reduction approach for improving the stability of feature selection algorithms. Furthermore, we propose an empirical variance reduction framework, margin based instance weighting, which weights training instances according to their influence to the estimation of feature relevance. We also develop an efficient algorithm under this framework. Experiments based on synthetic data and real-world micro array data verify both the theoretical framework and the effectiveness of the proposed algorithm on variance reduction. The proposed algorithm is also shown to be effective at improving subset stability, while maintaining comparable classification accuracy based on selected features.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"104 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"71","resultStr":"{\"title\":\"A Variance Reduction Framework for Stable Feature Selection\",\"authors\":\"Yue Han, Lei Yu\",\"doi\":\"10.1002/sam.11152\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Besides high accuracy, stability of feature selection has recently attracted strong interest in knowledge discovery from high-dimensional data. In this study, we present a theoretical framework about the relationship between the stability and accuracy of feature selection based on a formal bias-variance decomposition of feature selection error. The framework also suggests a variance reduction approach for improving the stability of feature selection algorithms. Furthermore, we propose an empirical variance reduction framework, margin based instance weighting, which weights training instances according to their influence to the estimation of feature relevance. We also develop an efficient algorithm under this framework. Experiments based on synthetic data and real-world micro array data verify both the theoretical framework and the effectiveness of the proposed algorithm on variance reduction. The proposed algorithm is also shown to be effective at improving subset stability, while maintaining comparable classification accuracy based on selected features.\",\"PeriodicalId\":294061,\"journal\":{\"name\":\"2010 IEEE International Conference on Data Mining\",\"volume\":\"104 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"71\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Conference on Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/sam.11152\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/sam.11152","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 71

摘要

除了高精度之外,特征选择的稳定性近年来引起了人们对高维数据知识发现的浓厚兴趣。在这项研究中,我们提出了一个基于特征选择误差的正式偏差方差分解的特征选择稳定性和准确性之间关系的理论框架。该框架还提出了一种方差减小方法来提高特征选择算法的稳定性。此外,我们提出了一种经验方差减少框架,即基于边际的实例加权,该框架根据训练实例对特征相关性估计的影响对其进行加权。并在此框架下开发了一种高效的算法。基于合成数据和实际微阵列数据的实验验证了该算法的理论框架和方差缩减的有效性。该算法还被证明在提高子集稳定性方面是有效的,同时保持基于所选特征的相当分类精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Variance Reduction Framework for Stable Feature Selection
Besides high accuracy, stability of feature selection has recently attracted strong interest in knowledge discovery from high-dimensional data. In this study, we present a theoretical framework about the relationship between the stability and accuracy of feature selection based on a formal bias-variance decomposition of feature selection error. The framework also suggests a variance reduction approach for improving the stability of feature selection algorithms. Furthermore, we propose an empirical variance reduction framework, margin based instance weighting, which weights training instances according to their influence to the estimation of feature relevance. We also develop an efficient algorithm under this framework. Experiments based on synthetic data and real-world micro array data verify both the theoretical framework and the effectiveness of the proposed algorithm on variance reduction. The proposed algorithm is also shown to be effective at improving subset stability, while maintaining comparable classification accuracy based on selected features.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信