边际筛选模型选择检验统计量的零分布:多元回归分析的意义

A. V. Rubanovich, V. Saenko
{"title":"边际筛选模型选择检验统计量的零分布:多元回归分析的意义","authors":"A. V. Rubanovich, V. Saenko","doi":"10.34257/gjsfrgvol21is1pg23","DOIUrl":null,"url":null,"abstract":"Marginal screening (MS) is the computationally simple and commonly used for the dimension reduction procedures. In it, a linear model is constructed for several top predictors, chosen according to the absolute value of marginal correlations with the dependent variable. Importantly, when kpredictors out of mprimary covariates are selected, the standard regression analysis may yield false-positive results if m>> k(Freedman's paradox). In this work, we provide analytical expressions describing null distribution of the test statistics for model selection via MS. Using the theory of order statistics, we show that under MS, the common F-statistic is distributed as a mean of ktop variables out of mindependent random variables having a 21χdistribution. Based on this finding, we estimated critical p-values for multiple regression models after MS, comparisons with which of those obtained in real studies will help researchers to avoid false-positive result. Analytical solutions obtained in the work are implemented in a free Excel spreadsheet program.","PeriodicalId":12547,"journal":{"name":"Global Journal of Science Frontier Research","volume":"138 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Null Distribution of the Test Statistic for Model Selection via Marginal Screening: Implications for Multivariate Regression Analysis\",\"authors\":\"A. V. Rubanovich, V. Saenko\",\"doi\":\"10.34257/gjsfrgvol21is1pg23\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Marginal screening (MS) is the computationally simple and commonly used for the dimension reduction procedures. In it, a linear model is constructed for several top predictors, chosen according to the absolute value of marginal correlations with the dependent variable. Importantly, when kpredictors out of mprimary covariates are selected, the standard regression analysis may yield false-positive results if m>> k(Freedman's paradox). In this work, we provide analytical expressions describing null distribution of the test statistics for model selection via MS. Using the theory of order statistics, we show that under MS, the common F-statistic is distributed as a mean of ktop variables out of mindependent random variables having a 21χdistribution. Based on this finding, we estimated critical p-values for multiple regression models after MS, comparisons with which of those obtained in real studies will help researchers to avoid false-positive result. Analytical solutions obtained in the work are implemented in a free Excel spreadsheet program.\",\"PeriodicalId\":12547,\"journal\":{\"name\":\"Global Journal of Science Frontier Research\",\"volume\":\"138 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Global Journal of Science Frontier Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.34257/gjsfrgvol21is1pg23\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Global Journal of Science Frontier Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34257/gjsfrgvol21is1pg23","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

边际筛选法(MS)是计算简单、常用的降维方法。在该模型中,根据与因变量的边际相关的绝对值选择几个顶级预测因子,构建了一个线性模型。重要的是,当选择主要协变量中的k个预测因子时,如果m>> k(弗里德曼悖论),标准回归分析可能产生假阳性结果。在这项工作中,我们提供了描述通过MS进行模型选择的检验统计量的零分布的解析表达式。利用序统计量理论,我们证明了在MS下,公共f统计量分布为具有21χ分布的独立随机变量中的ktop变量的均值。基于这一发现,我们估计了MS后多元回归模型的临界p值,与实际研究中得到的临界p值进行比较,有助于研究者避免假阳性结果。在工作中得到的分析解在一个免费的Excel电子表格程序中实现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Null Distribution of the Test Statistic for Model Selection via Marginal Screening: Implications for Multivariate Regression Analysis
Marginal screening (MS) is the computationally simple and commonly used for the dimension reduction procedures. In it, a linear model is constructed for several top predictors, chosen according to the absolute value of marginal correlations with the dependent variable. Importantly, when kpredictors out of mprimary covariates are selected, the standard regression analysis may yield false-positive results if m>> k(Freedman's paradox). In this work, we provide analytical expressions describing null distribution of the test statistics for model selection via MS. Using the theory of order statistics, we show that under MS, the common F-statistic is distributed as a mean of ktop variables out of mindependent random variables having a 21χdistribution. Based on this finding, we estimated critical p-values for multiple regression models after MS, comparisons with which of those obtained in real studies will help researchers to avoid false-positive result. Analytical solutions obtained in the work are implemented in a free Excel spreadsheet program.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信