Katarzyna Reluga , María-José Lombardía , Stefan Sperlich
{"title":"Bootstrap-based statistical inference for linear mixed effects under misspecifications","authors":"Katarzyna Reluga , María-José Lombardía , Stefan Sperlich","doi":"10.1016/j.csda.2024.108014","DOIUrl":null,"url":null,"abstract":"<div><p>Linear mixed effects are considered excellent predictors of cluster-level parameters in various domains. However, previous research has demonstrated that their performance is affected by departures from model assumptions. Given the common occurrence of these departures in empirical studies, there is a need for inferential methods that are robust to misspecifications while remaining accessible and appealing to practitioners. Statistical tools have been developed for cluster-wise and simultaneous inference for mixed effects under distributional misspecifications, employing a user-friendly semiparametric random effect bootstrap. The merits and limitations of this approach are discussed in the general context of model misspecification. Theoretical analysis demonstrates the asymptotic consistency of the methods under general regularity conditions. Simulations show that the proposed intervals are robust to departures from modelling assumptions, including asymmetry and long tails in the distributions of errors and random effects, outperforming competitors in terms of empirical coverage probability. Finally, the methodology is applied to construct confidence intervals for household income across counties in the Spanish region of Galicia.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"199 ","pages":"Article 108014"},"PeriodicalIF":1.5000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324000987/pdfft?md5=733458402da2cf31e9cef3842c8c4865&pid=1-s2.0-S0167947324000987-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Statistics & Data Analysis","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167947324000987","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Linear mixed effects are considered excellent predictors of cluster-level parameters in various domains. However, previous research has demonstrated that their performance is affected by departures from model assumptions. Given the common occurrence of these departures in empirical studies, there is a need for inferential methods that are robust to misspecifications while remaining accessible and appealing to practitioners. Statistical tools have been developed for cluster-wise and simultaneous inference for mixed effects under distributional misspecifications, employing a user-friendly semiparametric random effect bootstrap. The merits and limitations of this approach are discussed in the general context of model misspecification. Theoretical analysis demonstrates the asymptotic consistency of the methods under general regularity conditions. Simulations show that the proposed intervals are robust to departures from modelling assumptions, including asymmetry and long tails in the distributions of errors and random effects, outperforming competitors in terms of empirical coverage probability. Finally, the methodology is applied to construct confidence intervals for household income across counties in the Spanish region of Galicia.
期刊介绍:
Computational Statistics and Data Analysis (CSDA), an Official Publication of the network Computational and Methodological Statistics (CMStatistics) and of the International Association for Statistical Computing (IASC), is an international journal dedicated to the dissemination of methodological research and applications in the areas of computational statistics and data analysis. The journal consists of four refereed sections which are divided into the following subject areas:
I) Computational Statistics - Manuscripts dealing with: 1) the explicit impact of computers on statistical methodology (e.g., Bayesian computing, bioinformatics,computer graphics, computer intensive inferential methods, data exploration, data mining, expert systems, heuristics, knowledge based systems, machine learning, neural networks, numerical and optimization methods, parallel computing, statistical databases, statistical systems), and 2) the development, evaluation and validation of statistical software and algorithms. Software and algorithms can be submitted with manuscripts and will be stored together with the online article.
II) Statistical Methodology for Data Analysis - Manuscripts dealing with novel and original data analytical strategies and methodologies applied in biostatistics (design and analytic methods for clinical trials, epidemiological studies, statistical genetics, or genetic/environmental interactions), chemometrics, classification, data exploration, density estimation, design of experiments, environmetrics, education, image analysis, marketing, model free data exploration, pattern recognition, psychometrics, statistical physics, image processing, robust procedures.
[...]
III) Special Applications - [...]
IV) Annals of Statistical Data Science [...]