Fast and efficient correction for population stratification in multi-locus genome-wide association studies.

IF 1.3 4区 生物学 Q4 GENETICS & HEREDITY
Genetica Pub Date : 2021-12-01 Epub Date: 2021-09-04 DOI:10.1007/s10709-021-00129-3
Rui Liu, Min Yuan, Xu Steven Xu, Yaning Yang
{"title":"Fast and efficient correction for population stratification in multi-locus genome-wide association studies.","authors":"Rui Liu,&nbsp;Min Yuan,&nbsp;Xu Steven Xu,&nbsp;Yaning Yang","doi":"10.1007/s10709-021-00129-3","DOIUrl":null,"url":null,"abstract":"<p><p>Reducing false discoveries caused by population stratification (PS) has always been a challenge in genome-wide association studies (GWAS). The current literature established several single marker approaches including genomic control (GC), EIGENSTRAT and generalized linear mixed model association test (GMMAT) and multi-marker methods such as LASSO mixed model (LASSOMM). However, the single-marker methods require prespecifying an arbitrary p value threshold in the selection process, likely resulting in suboptimal precision or recall. On the other hand, it appears that LASSOMM is extremely computationally intensive and may not suitable for large-scale GWAS. In this paper, we proposed a simple multi-marker approach (PCA-LASSO) combining principal component analysis (PCA) and least absolute shrinkage and selection operator (LASSO). We utilize PCA to correct for the confounding effects of PS and LASSO with built-in cross-validation for a data-driven selection. Compared to the current single-marker approaches, the proposed PCA-LASSO provides optimal balance between precision and recall, and consequently superior F<sub>1</sub> scores. Similarly, compared to LASSOMM, PCA-LASSO markedly increases the precision while minimizing the loss of recall, and therefore improves the overall F<sub>1</sub> score in presence of PS. More importantly, PCA-LASSO drastically reduces the computational time by > 1000 times when compared to LASSOMM. We applied PCA-LASSO to a real dataset of Alzheimer's disease and successfully identified SNP rs429358 (Gene APOE4) which has been widely reported to be associated with the onset and elevated risk of Alzheimer's disease. In conclusion, PCA-LASSO is a simple, fast, but accurate approach for GWAS in presence of latent PS.</p>","PeriodicalId":55121,"journal":{"name":"Genetica","volume":"149 5-6","pages":"313-325"},"PeriodicalIF":1.3000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetica","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s10709-021-00129-3","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/9/4 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 1

Abstract

Reducing false discoveries caused by population stratification (PS) has always been a challenge in genome-wide association studies (GWAS). The current literature established several single marker approaches including genomic control (GC), EIGENSTRAT and generalized linear mixed model association test (GMMAT) and multi-marker methods such as LASSO mixed model (LASSOMM). However, the single-marker methods require prespecifying an arbitrary p value threshold in the selection process, likely resulting in suboptimal precision or recall. On the other hand, it appears that LASSOMM is extremely computationally intensive and may not suitable for large-scale GWAS. In this paper, we proposed a simple multi-marker approach (PCA-LASSO) combining principal component analysis (PCA) and least absolute shrinkage and selection operator (LASSO). We utilize PCA to correct for the confounding effects of PS and LASSO with built-in cross-validation for a data-driven selection. Compared to the current single-marker approaches, the proposed PCA-LASSO provides optimal balance between precision and recall, and consequently superior F1 scores. Similarly, compared to LASSOMM, PCA-LASSO markedly increases the precision while minimizing the loss of recall, and therefore improves the overall F1 score in presence of PS. More importantly, PCA-LASSO drastically reduces the computational time by > 1000 times when compared to LASSOMM. We applied PCA-LASSO to a real dataset of Alzheimer's disease and successfully identified SNP rs429358 (Gene APOE4) which has been widely reported to be associated with the onset and elevated risk of Alzheimer's disease. In conclusion, PCA-LASSO is a simple, fast, but accurate approach for GWAS in presence of latent PS.

在多基因座全基因组关联研究中快速有效地校正种群分层。
减少群体分层(PS)导致的错误发现一直是全基因组关联研究(GWAS)面临的挑战。目前的文献建立了几种单标记方法,包括基因组控制(GC)、特征序列分析(EIGENSTRAT)和广义线性混合模型关联检验(GMMAT),以及LASSO混合模型(LASSOMM)等多标记方法。然而,单标记方法需要在选择过程中预先指定任意p值阈值,可能导致次优精度或召回。另一方面,LASSOMM的计算量非常大,可能不适合大规模的GWAS。本文提出了一种结合主成分分析(PCA)和最小绝对收缩和选择算子(LASSO)的简单多标记方法(PCA-LASSO)。我们利用PCA来纠正PS和LASSO的混淆效应,并内置交叉验证以进行数据驱动的选择。与当前的单标记方法相比,所提出的PCA-LASSO在准确率和召回率之间提供了最佳平衡,因此获得了更高的F1分数。同样,与LASSOMM相比,PCA-LASSO在最小化召回损失的同时显著提高了精度,因此在PS存在的情况下提高了F1总分。更重要的是,PCA-LASSO与LASSOMM相比,计算时间大大减少了1000倍以上。我们将PCA-LASSO应用于阿尔茨海默病的真实数据集,成功鉴定出SNP rs429358(基因APOE4),该基因已被广泛报道与阿尔茨海默病的发病和风险升高相关。总之,PCA-LASSO是一种简单、快速、准确的检测潜在PS存在的GWAS的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Genetica
Genetica 生物-遗传学
CiteScore
2.70
自引率
0.00%
发文量
32
审稿时长
>12 weeks
期刊介绍: Genetica publishes papers dealing with genetics, genomics, and evolution. Our journal covers novel advances in the fields of genomics, conservation genetics, genotype-phenotype interactions, evo-devo, population and quantitative genetics, and biodiversity. Genetica publishes original research articles addressing novel conceptual, experimental, and theoretical issues in these areas, whatever the taxon considered. Biomedical papers and papers on breeding animal and plant genetics are not within the scope of Genetica, unless framed in an evolutionary context. Recent advances in genetics, genomics and evolution are also published in thematic issues and synthesis papers published by experts in the field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信