线性回归模型中的后模型选择推理:综合综述

IF 11 Q1 STATISTICS & PROBABILITY
Dongliang Zhang, Abbas Khalili, M. Asgharian
{"title":"线性回归模型中的后模型选择推理:综合综述","authors":"Dongliang Zhang, Abbas Khalili, M. Asgharian","doi":"10.1214/22-ss135","DOIUrl":null,"url":null,"abstract":"The research on statistical inference after data-driven model selection can be traced as far back as Koopmans (1949). The intensive research on modern model selection methods for high-dimensional data over the past three decades revived the interest in statistical inference after model selection. In recent years, there has been a surge of articles on statistical inference after model selection and now a rather vast literature exists on this topic. Our manuscript aims at presenting a holistic review of post-model-selection inference in linear regression models, while also incorporating perspectives from high-dimensional inference in these models. We first give a simulated example motivating the necessity for valid statistical inference after model selection. We then provide theoretical insights explaining the phenomena observed in the example. This is done through a literature survey on the post-selection sampling distribution of regression parameter estimators and properties of coverage probabilities of näıve confidence intervals. Categorized according to two types of estimation targets, namely the populationand projection-based regression coefficients, we present a review of recent uncertainty assessment methods. We also discuss possible pros and cons for the confidence intervals constructed by different methods. MSC2020 subject classifications: Primary 62F25; secondary 62J07.","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":"1 1","pages":""},"PeriodicalIF":11.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Post-model-selection inference in linear regression models: An integrated review\",\"authors\":\"Dongliang Zhang, Abbas Khalili, M. Asgharian\",\"doi\":\"10.1214/22-ss135\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The research on statistical inference after data-driven model selection can be traced as far back as Koopmans (1949). The intensive research on modern model selection methods for high-dimensional data over the past three decades revived the interest in statistical inference after model selection. In recent years, there has been a surge of articles on statistical inference after model selection and now a rather vast literature exists on this topic. Our manuscript aims at presenting a holistic review of post-model-selection inference in linear regression models, while also incorporating perspectives from high-dimensional inference in these models. We first give a simulated example motivating the necessity for valid statistical inference after model selection. We then provide theoretical insights explaining the phenomena observed in the example. This is done through a literature survey on the post-selection sampling distribution of regression parameter estimators and properties of coverage probabilities of näıve confidence intervals. Categorized according to two types of estimation targets, namely the populationand projection-based regression coefficients, we present a review of recent uncertainty assessment methods. We also discuss possible pros and cons for the confidence intervals constructed by different methods. MSC2020 subject classifications: Primary 62F25; secondary 62J07.\",\"PeriodicalId\":46627,\"journal\":{\"name\":\"Statistics Surveys\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":11.0000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistics Surveys\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1214/22-ss135\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics Surveys","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1214/22-ss135","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 12

摘要

对数据驱动模型选择后的统计推断的研究,最早可以追溯到Koopmans(1949)。近三十年来,对现代高维数据模型选择方法的深入研究,重新唤起了对模型选择后统计推断的兴趣。近年来,关于模型选择后的统计推断的文章激增,目前已有相当多的文献。我们的手稿旨在对线性回归模型中的后模型选择推理进行全面回顾,同时也结合了这些模型中高维推理的观点。我们首先给出一个模拟的例子,说明在模型选择后进行有效统计推断的必要性。然后,我们提供理论见解来解释在示例中观察到的现象。这是通过对回归参数估计器的选择后抽样分布和näıve置信区间的覆盖概率属性的文献调查来完成的。根据两类估计目标,即基于人口的回归系数和基于预测的回归系数,我们对最近的不确定性评估方法进行了综述。我们还讨论了不同方法构造的置信区间可能的优缺点。MSC2020学科分类:Primary 62F25;二次62 j07。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Post-model-selection inference in linear regression models: An integrated review
The research on statistical inference after data-driven model selection can be traced as far back as Koopmans (1949). The intensive research on modern model selection methods for high-dimensional data over the past three decades revived the interest in statistical inference after model selection. In recent years, there has been a surge of articles on statistical inference after model selection and now a rather vast literature exists on this topic. Our manuscript aims at presenting a holistic review of post-model-selection inference in linear regression models, while also incorporating perspectives from high-dimensional inference in these models. We first give a simulated example motivating the necessity for valid statistical inference after model selection. We then provide theoretical insights explaining the phenomena observed in the example. This is done through a literature survey on the post-selection sampling distribution of regression parameter estimators and properties of coverage probabilities of näıve confidence intervals. Categorized according to two types of estimation targets, namely the populationand projection-based regression coefficients, we present a review of recent uncertainty assessment methods. We also discuss possible pros and cons for the confidence intervals constructed by different methods. MSC2020 subject classifications: Primary 62F25; secondary 62J07.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Statistics Surveys
Statistics Surveys STATISTICS & PROBABILITY-
CiteScore
11.70
自引率
0.00%
发文量
5
期刊介绍: Statistics Surveys publishes survey articles in theoretical, computational, and applied statistics. The style of articles may range from reviews of recent research to graduate textbook exposition. Articles may be broad or narrow in scope. The essential requirements are a well specified topic and target audience, together with clear exposition. Statistics Surveys is sponsored by the American Statistical Association, the Bernoulli Society, the Institute of Mathematical Statistics, and by the Statistical Society of Canada.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信