Robustness of the linear mixed effects model to error distribution assumptions and the consequences for genome-wide association studies.

IF 0.4 4区数学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY

Statistical Applications in Genetics and Molecular Biology Pub Date : 2014-10-01 DOI:10.1515/sagmb-2013-0066

Nicole M Warrington, Kate Tilling, Laura D Howe, Lavinia Paternoster, Craig E Pennell, Yan Yan Wu, Laurent Briollais

{"title":"Robustness of the linear mixed effects model to error distribution assumptions and the consequences for genome-wide association studies.","authors":"Nicole M Warrington, Kate Tilling, Laura D Howe, Lavinia Paternoster, Craig E Pennell, Yan Yan Wu, Laurent Briollais","doi":"10.1515/sagmb-2013-0066","DOIUrl":null,"url":null,"abstract":"<p><p>Genome-wide association studies have been successful in uncovering novel genetic variants that are associated with disease status or cross-sectional phenotypic traits. Researchers are beginning to investigate how genes play a role in the development of a trait over time. Linear mixed effects models (LMM) are commonly used to model longitudinal data; however, it is unclear if the failure to meet the models distributional assumptions will affect the conclusions when conducting a genome-wide association study. In an extensive simulation study, we compare coverage probabilities, bias, type 1 error rates and statistical power when the error of the LMM is either heteroscedastic or has a non-Gaussian distribution. We conclude that the model is robust to misspecification if the same function of age is included in the fixed and random effects. However, type 1 error of the genetic effect over time is inflated, regardless of the model misspecification, if the polynomial function for age in the fixed and random effects differs. In situations where the model will not converge with a high order polynomial function in the random effects, a reduced function can be used but a robust standard error needs to be calculated to avoid inflation of the type 1 error. As an illustration, a LMM was applied to longitudinal body mass index (BMI) data over childhood in the ALSPAC cohort; the results emphasised the need for the robust standard error to ensure correct inference of associations of longitudinal BMI with chromosome 16 single nucleotide polymorphisms.</p>","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"13 5","pages":"567-87"},"PeriodicalIF":0.4000,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Applications in Genetics and Molecular Biology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/sagmb-2013-0066","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Genome-wide association studies have been successful in uncovering novel genetic variants that are associated with disease status or cross-sectional phenotypic traits. Researchers are beginning to investigate how genes play a role in the development of a trait over time. Linear mixed effects models (LMM) are commonly used to model longitudinal data; however, it is unclear if the failure to meet the models distributional assumptions will affect the conclusions when conducting a genome-wide association study. In an extensive simulation study, we compare coverage probabilities, bias, type 1 error rates and statistical power when the error of the LMM is either heteroscedastic or has a non-Gaussian distribution. We conclude that the model is robust to misspecification if the same function of age is included in the fixed and random effects. However, type 1 error of the genetic effect over time is inflated, regardless of the model misspecification, if the polynomial function for age in the fixed and random effects differs. In situations where the model will not converge with a high order polynomial function in the random effects, a reduced function can be used but a robust standard error needs to be calculated to avoid inflation of the type 1 error. As an illustration, a LMM was applied to longitudinal body mass index (BMI) data over childhood in the ALSPAC cohort; the results emphasised the need for the robust standard error to ensure correct inference of associations of longitudinal BMI with chromosome 16 single nucleotide polymorphisms.

查看原文本刊更多论文

线性混合效应模型对误差分布假设的稳健性和全基因组关联研究的结果。

全基因组关联研究已经成功地揭示了与疾病状态或横断面表型性状相关的新型遗传变异。研究人员开始研究基因是如何随着时间的推移在一种特征的发展中发挥作用的。线性混合效应模型（LMM）是常用的纵向数据模型；然而，在进行全基因组关联研究时，不符合模型的分布假设是否会影响结论尚不清楚。在广泛的模拟研究中，我们比较了LMM误差为异方差或非高斯分布时的覆盖概率、偏差、1型错误率和统计功率。我们得出结论，如果在固定效应和随机效应中包含相同的年龄函数，则模型对错误规范具有鲁棒性。然而，如果固定效应和随机效应中的年龄多项式函数不同，则不管模型的错误说明如何，遗传效应随时间的第一类误差都会被夸大。在随机效应中模型不收敛于高阶多项式函数的情况下，可以使用简化函数，但需要计算鲁棒标准误差，以避免第一类误差的膨胀。作为一个例子，LMM应用于ALSPAC队列儿童时期的纵向体重指数（BMI）数据；结果强调需要稳健的标准误差，以确保正确推断纵向BMI与16号染色体单核苷酸多态性的关联。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Statistical Applications in Genetics and Molecular Biology BIOCHEMISTRY & MOLECULAR BIOLOGY-MATHEMATICAL & COMPUTATIONAL BIOLOGY

自引率

11.10%

发文量

期刊介绍： Statistical Applications in Genetics and Molecular Biology seeks to publish significant research on the application of statistical ideas to problems arising from computational biology. The focus of the papers should be on the relevant statistical issues but should contain a succinct description of the relevant biological problem being considered. The range of topics is wide and will include topics such as linkage mapping, association studies, gene finding and sequence alignment, protein structure prediction, design and analysis of microarray data, molecular evolution and phylogenetic trees, DNA topology, and data base search strategies. Both original research and review articles will be warmly received.