Random effects misspecification and its consequences for prediction in generalized linear mixed models

IF 1.6 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis Pub Date : 2025-07-29 DOI:10.1016/j.csda.2025.108254

Quan Vu , Francis K.C. Hui , Samuel Muller , A.H. Welsh

{"title":"Random effects misspecification and its consequences for prediction in generalized linear mixed models","authors":"Quan Vu , Francis K.C. Hui , Samuel Muller , A.H. Welsh","doi":"10.1016/j.csda.2025.108254","DOIUrl":null,"url":null,"abstract":"<div><div>When fitting generalized linear mixed models, choosing the random effects distribution is an important decision. As random effects are unobserved, misspecification of their distribution is a real possibility. Thus, the consequences of random effects misspecification for point prediction and prediction inference of random effects in generalized linear mixed models need to be investigated. A combination of theory, simulation, and a real application is used to explore the effect of using the common normality assumption for the random effects distribution when the correct specification is a mixture of normal distributions, focusing on the impacts on point prediction, mean squared prediction errors, and prediction intervals. Results show that the level of shrinkage for the predicted random effects can differ greatly under the two random effect distributions, and so is susceptible to misspecification. Also, the unconditional mean squared prediction errors for the random effects are almost always larger under the misspecified normal random effects distribution, while results for the mean squared prediction errors conditional on the random effects are more complicated but remain generally larger under the misspecified distribution (especially when the true random effect is close to the mean of one of the component distributions in the true mixture distribution). Results for prediction intervals indicate that the overall coverage probability is, in contrast, not greatly impacted by misspecification. It is concluded that misspecifying the random effects distribution can affect prediction of random effects, and greater caution is recommended when adopting the normality assumption in generalized linear mixed models.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"213 ","pages":"Article 108254"},"PeriodicalIF":1.6000,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Statistics & Data Analysis","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167947325001306","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

When fitting generalized linear mixed models, choosing the random effects distribution is an important decision. As random effects are unobserved, misspecification of their distribution is a real possibility. Thus, the consequences of random effects misspecification for point prediction and prediction inference of random effects in generalized linear mixed models need to be investigated. A combination of theory, simulation, and a real application is used to explore the effect of using the common normality assumption for the random effects distribution when the correct specification is a mixture of normal distributions, focusing on the impacts on point prediction, mean squared prediction errors, and prediction intervals. Results show that the level of shrinkage for the predicted random effects can differ greatly under the two random effect distributions, and so is susceptible to misspecification. Also, the unconditional mean squared prediction errors for the random effects are almost always larger under the misspecified normal random effects distribution, while results for the mean squared prediction errors conditional on the random effects are more complicated but remain generally larger under the misspecified distribution (especially when the true random effect is close to the mean of one of the component distributions in the true mixture distribution). Results for prediction intervals indicate that the overall coverage probability is, in contrast, not greatly impacted by misspecification. It is concluded that misspecifying the random effects distribution can affect prediction of random effects, and greater caution is recommended when adopting the normality assumption in generalized linear mixed models.

查看原文本刊更多论文

广义线性混合模型中的随机效应、错配及其预测后果

在拟合广义线性混合模型时，选择随机效应分布是一个重要决策。由于随机效应是无法观察到的，因此对其分布的错误描述是很有可能的。因此，需要研究广义线性混合模型中随机效应错配对点预测和随机效应预测推理的影响。本文采用理论、模拟和实际应用相结合的方法，探讨了当正确的规范是正态分布的混合时，对随机效应分布使用普通正态假设的效果，重点关注对点预测、均方预测误差和预测区间的影响。结果表明，在两种随机效应分布下，预测的随机效应收缩水平会有很大差异，因此容易出现误规范。此外，在错误指定的正态随机效应分布下，随机效应的无条件均方预测误差几乎总是较大，而在错误指定的分布下，随机效应条件下的均方预测误差结果更复杂，但通常仍然较大（特别是当真实随机效应接近真实混合分布中某个分量分布的平均值时）。相反，预测区间的结果表明，总体覆盖概率不受规格错误的影响。结果表明，随机效应分布的指定不当会影响随机效应的预测，建议在广义线性混合模型中采用正态性假设时要更加谨慎。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational Statistics & Data Analysis 数学-计算机：跨学科应用

CiteScore

3.70

自引率

5.60%

发文量

167

审稿时长

60 days

期刊介绍： Computational Statistics and Data Analysis (CSDA), an Official Publication of the network Computational and Methodological Statistics (CMStatistics) and of the International Association for Statistical Computing (IASC), is an international journal dedicated to the dissemination of methodological research and applications in the areas of computational statistics and data analysis. The journal consists of four refereed sections which are divided into the following subject areas: I) Computational Statistics - Manuscripts dealing with: 1) the explicit impact of computers on statistical methodology (e.g., Bayesian computing, bioinformatics,computer graphics, computer intensive inferential methods, data exploration, data mining, expert systems, heuristics, knowledge based systems, machine learning, neural networks, numerical and optimization methods, parallel computing, statistical databases, statistical systems), and 2) the development, evaluation and validation of statistical software and algorithms. Software and algorithms can be submitted with manuscripts and will be stored together with the online article. II) Statistical Methodology for Data Analysis - Manuscripts dealing with novel and original data analytical strategies and methodologies applied in biostatistics (design and analytic methods for clinical trials, epidemiological studies, statistical genetics, or genetic/environmental interactions), chemometrics, classification, data exploration, density estimation, design of experiments, environmetrics, education, image analysis, marketing, model free data exploration, pattern recognition, psychometrics, statistical physics, image processing, robust procedures. [...] III) Special Applications - [...] IV) Annals of Statistical Data Science [...]