Subsampling from features in large regression to find “winning features”

Yiying Fan, Jiayang Sun
{"title":"Subsampling from features in large regression to find “winning features”","authors":"Yiying Fan, Jiayang Sun","doi":"10.1002/sam.11499","DOIUrl":null,"url":null,"abstract":"Feature (or variable) selection from a large number of p features continuously challenges data science, especially for ever‐enlarging data and in discovering scientifically important features in a regression setting. For example, to develop valid drug targets for ovarian cancer, we must control the false‐discovery rate (FDR) of a selection procedure. The popular approach to feature selection in large‐p regression uses a penalized likelihood or a shrinkage estimation, such as a LASSO, SCAD, Elastic Net, or MCP procedure. We present a different approach called the Subsampling Winner algorithm (SWA), which subsamples from p features. The idea of SWA is analogous to selecting US national merit scholars' that selects semifinalists based on student's performance in tests done at local schools (a.k.a. subsample analyses), and then determine the finalists (a.k.a. winning features) from the semifinalists. Due to its subsampling nature, SWA can scale to data of any dimension. SWA also has the best‐controlled FDR compared to the penalized and Random Forest procedures while having a competitive true‐feature discovery rate. Our application of SWA to an ovarian cancer data revealed functionally important genes and pathways.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining: The ASA Data Science Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/sam.11499","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Feature (or variable) selection from a large number of p features continuously challenges data science, especially for ever‐enlarging data and in discovering scientifically important features in a regression setting. For example, to develop valid drug targets for ovarian cancer, we must control the false‐discovery rate (FDR) of a selection procedure. The popular approach to feature selection in large‐p regression uses a penalized likelihood or a shrinkage estimation, such as a LASSO, SCAD, Elastic Net, or MCP procedure. We present a different approach called the Subsampling Winner algorithm (SWA), which subsamples from p features. The idea of SWA is analogous to selecting US national merit scholars' that selects semifinalists based on student's performance in tests done at local schools (a.k.a. subsample analyses), and then determine the finalists (a.k.a. winning features) from the semifinalists. Due to its subsampling nature, SWA can scale to data of any dimension. SWA also has the best‐controlled FDR compared to the penalized and Random Forest procedures while having a competitive true‐feature discovery rate. Our application of SWA to an ovarian cancer data revealed functionally important genes and pathways.
从大回归的特征中进行子采样以找到“获胜特征”
从大量p个特征中选择特征(或变量)不断挑战数据科学,特别是对于不断扩大的数据和在回归设置中发现科学上重要的特征。例如,为了开发有效的卵巢癌药物靶点,我们必须控制选择程序的错误发现率(FDR)。大p回归中流行的特征选择方法使用惩罚似然或收缩估计,如LASSO, SCAD, Elastic Net或MCP程序。我们提出了一种不同的方法,称为Subsampling Winner算法(SWA),它从p个特征中进行子采样。SWA的理念类似于选择美国国家优秀学者,根据学生在当地学校的测试成绩(即子样本分析)选择半决赛选手,然后从半决赛选手中确定决赛选手(即获胜特征)。由于其子采样性质,SWA可以扩展到任何维度的数据。与惩罚和随机森林程序相比,SWA还具有最佳控制的FDR,同时具有具有竞争力的真特征发现率。我们将SWA应用于卵巢癌数据,揭示了功能上重要的基因和途径。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信