Data-Mining Homogeneous Subgroups in Multiple Regression When Heteroscedasticity, Multicollinearity, and Missing Variables Confound Predictor Effects

IF 0.5 Q4 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS
R. Francoeur
{"title":"Data-Mining Homogeneous Subgroups in Multiple Regression When Heteroscedasticity, Multicollinearity, and Missing Variables Confound Predictor Effects","authors":"R. Francoeur","doi":"10.1142/s2424922x20410041","DOIUrl":null,"url":null,"abstract":"Multiple regression is not reliable to recover predictor slopes within homogeneous subgroups from heterogeneous samples. In contrast to Monte Carlo analysis, which assigns completely to the first-specified predictor the variation it shares with the remaining predictors, multiple regression does not assign this shared variation to any predictor, and it is sequestered in the residual term. This unassigned and confounding variation may correlate with specified predictors, lead to heteroscedasticity, and distort multicollinearity. I develop and test an iterative, sequential algorithm to estimate a two-part series of weighted least-square (WLS) multiple regressions for recovering the Monte Carlo predictor slopes in three homogeneous subgroups (each generated with 500 observations) of a heterogeneous sample [Formula: see text]. Each variable has a different nonnormal distribution. The algorithm mines each subgroup and then adjusts bias within it from 1) heteroscedasticity related to one, some, or all specified predictors and 2) “nonessential” multicollinearity. It recovers all three specified predictor slopes across the three subgroups in two scenarios, with one influenced also by two unspecified predictors. The algorithm extends adaptive analysis to discover and appraise patterns in field research and machine learning when predictors are inter-correlated, and even unspecified, in order to reveal unbiased outcome clusters in heterogeneous and homogeneous samples with nonnormal outcome and predictors.","PeriodicalId":47145,"journal":{"name":"Advances in Data Science and Adaptive Analysis","volume":"24 1","pages":"2041004:1-2041004:59"},"PeriodicalIF":0.5000,"publicationDate":"2020-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Data Science and Adaptive Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s2424922x20410041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Multiple regression is not reliable to recover predictor slopes within homogeneous subgroups from heterogeneous samples. In contrast to Monte Carlo analysis, which assigns completely to the first-specified predictor the variation it shares with the remaining predictors, multiple regression does not assign this shared variation to any predictor, and it is sequestered in the residual term. This unassigned and confounding variation may correlate with specified predictors, lead to heteroscedasticity, and distort multicollinearity. I develop and test an iterative, sequential algorithm to estimate a two-part series of weighted least-square (WLS) multiple regressions for recovering the Monte Carlo predictor slopes in three homogeneous subgroups (each generated with 500 observations) of a heterogeneous sample [Formula: see text]. Each variable has a different nonnormal distribution. The algorithm mines each subgroup and then adjusts bias within it from 1) heteroscedasticity related to one, some, or all specified predictors and 2) “nonessential” multicollinearity. It recovers all three specified predictor slopes across the three subgroups in two scenarios, with one influenced also by two unspecified predictors. The algorithm extends adaptive analysis to discover and appraise patterns in field research and machine learning when predictors are inter-correlated, and even unspecified, in order to reveal unbiased outcome clusters in heterogeneous and homogeneous samples with nonnormal outcome and predictors.
当异方差、多重共线性和缺失变量混淆预测效应时,多元回归中同质子群的数据挖掘
多元回归在异质性样本的同质亚群中恢复预测斜率是不可靠的。与蒙特卡罗分析相反,蒙特卡罗分析将其与其他预测因子共享的变化完全分配给第一个指定的预测因子,多元回归不将这种共享的变化分配给任何预测因子,并且它被隔离在残差项中。这种未分配和混杂的变异可能与特定的预测因子相关,导致异方差,并扭曲多重共线性。我开发并测试了一种迭代的顺序算法,用于估计两个部分的加权最小二乘(WLS)多元回归序列,以恢复异质性样本的三个同质子组(每个子组由500个观测值生成)中的蒙特卡罗预测斜率[公式:见文本]。每个变量都有不同的非正态分布。该算法挖掘每个子组,然后调整其中的偏差:1)与一个、一些或所有指定预测因子相关的异方差;2)“非必要的”多重共线性。它在两个场景中恢复了三个子组中所有三个指定的预测斜率,其中一个也受到两个未指定预测因子的影响。该算法扩展了自适应分析,在预测因子相互关联甚至未指定的情况下,发现和评估现场研究和机器学习中的模式,以便在具有非正常结果和预测因子的异质和同质样本中揭示无偏结果簇。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Advances in Data Science and Adaptive Analysis
Advances in Data Science and Adaptive Analysis MATHEMATICS, INTERDISCIPLINARY APPLICATIONS-
自引率
0.00%
发文量
13
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信