多尺度自举特征选择后的选择性推理

IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY
Yoshikazu Terada, Hidetoshi Shimodaira
{"title":"多尺度自举特征选择后的选择性推理","authors":"Yoshikazu Terada,&nbsp;Hidetoshi Shimodaira","doi":"10.1007/s10463-022-00838-2","DOIUrl":null,"url":null,"abstract":"<div><p>It is common to show the confidence intervals or <i>p</i>-values of selected features, or predictor variables in regression, but they often involve selection bias. The selective inference approach solves this bias by conditioning on the selection event. Most existing studies of selective inference consider a specific algorithm, such as Lasso, for feature selection, and thus they have difficulties in handling more complicated algorithms. Moreover, existing studies often consider unnecessarily restrictive events, leading to over-conditioning and lower statistical power. Our novel and widely applicable resampling method via multiscale bootstrap addresses these issues to compute an approximately unbiased selective <i>p</i>-value for the selected features. As a simplification of the proposed method, we also develop a simpler method via the classical bootstrap. We prove that the <i>p</i>-value computed by our multiscale bootstrap method is more accurate than the classical bootstrap method. Furthermore, numerical experiments demonstrate that our algorithm works well even for more complicated feature selection methods such as non-convex regularization.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":null,"pages":null},"PeriodicalIF":0.8000,"publicationDate":"2022-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Selective inference after feature selection via multiscale bootstrap\",\"authors\":\"Yoshikazu Terada,&nbsp;Hidetoshi Shimodaira\",\"doi\":\"10.1007/s10463-022-00838-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>It is common to show the confidence intervals or <i>p</i>-values of selected features, or predictor variables in regression, but they often involve selection bias. The selective inference approach solves this bias by conditioning on the selection event. Most existing studies of selective inference consider a specific algorithm, such as Lasso, for feature selection, and thus they have difficulties in handling more complicated algorithms. Moreover, existing studies often consider unnecessarily restrictive events, leading to over-conditioning and lower statistical power. Our novel and widely applicable resampling method via multiscale bootstrap addresses these issues to compute an approximately unbiased selective <i>p</i>-value for the selected features. As a simplification of the proposed method, we also develop a simpler method via the classical bootstrap. We prove that the <i>p</i>-value computed by our multiscale bootstrap method is more accurate than the classical bootstrap method. Furthermore, numerical experiments demonstrate that our algorithm works well even for more complicated feature selection methods such as non-convex regularization.</p></div>\",\"PeriodicalId\":55511,\"journal\":{\"name\":\"Annals of the Institute of Statistical Mathematics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2022-07-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of the Institute of Statistical Mathematics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10463-022-00838-2\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of the Institute of Statistical Mathematics","FirstCategoryId":"100","ListUrlMain":"https://link.springer.com/article/10.1007/s10463-022-00838-2","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 2

摘要

在回归中显示所选特征或预测变量的置信区间或p值是很常见的,但它们通常涉及选择偏差。选择性推理方法通过对选择事件进行条件反射来解决这种偏差。大多数现有的选择性推理研究都是采用特定的算法(如Lasso)来进行特征选择,因此在处理更复杂的算法时存在困难。此外,现有的研究经常考虑不必要的限制性事件,导致过度调节和较低的统计能力。我们新颖且广泛适用的多尺度自举重采样方法解决了这些问题,为所选特征计算近似无偏选择性p值。作为该方法的简化,我们还开发了一种更简单的方法,即经典自举法。证明了用多尺度自举法计算的p值比经典自举法计算的p值更精确。此外,数值实验表明,即使对于非凸正则化等更复杂的特征选择方法,我们的算法也能很好地工作。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Selective inference after feature selection via multiscale bootstrap

Selective inference after feature selection via multiscale bootstrap

It is common to show the confidence intervals or p-values of selected features, or predictor variables in regression, but they often involve selection bias. The selective inference approach solves this bias by conditioning on the selection event. Most existing studies of selective inference consider a specific algorithm, such as Lasso, for feature selection, and thus they have difficulties in handling more complicated algorithms. Moreover, existing studies often consider unnecessarily restrictive events, leading to over-conditioning and lower statistical power. Our novel and widely applicable resampling method via multiscale bootstrap addresses these issues to compute an approximately unbiased selective p-value for the selected features. As a simplification of the proposed method, we also develop a simpler method via the classical bootstrap. We prove that the p-value computed by our multiscale bootstrap method is more accurate than the classical bootstrap method. Furthermore, numerical experiments demonstrate that our algorithm works well even for more complicated feature selection methods such as non-convex regularization.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.00
自引率
0.00%
发文量
39
审稿时长
6-12 weeks
期刊介绍: Annals of the Institute of Statistical Mathematics (AISM) aims to provide a forum for open communication among statisticians, and to contribute to the advancement of statistics as a science to enable humans to handle information in order to cope with uncertainties. It publishes high-quality papers that shed new light on the theoretical, computational and/or methodological aspects of statistical science. Emphasis is placed on (a) development of new methodologies motivated by real data, (b) development of unifying theories, and (c) analysis and improvement of existing methodologies and theories.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信