Comprehensive Assessment of Genotype Imputation Performance.

IF 1.5 4区生物学 Q4 GENETICS & HEREDITY

Human Heredity Pub Date : 2018-01-01 Epub Date: 2019-01-22 DOI:10.1159/000489758

Shuo Shi, Na Yuan, Ming Yang, Zhenglin Du, Jinyue Wang, Xin Sheng, Jiayan Wu, Jingfa Xiao

{"title":"Comprehensive Assessment of Genotype Imputation Performance.","authors":"Shuo Shi, Na Yuan, Ming Yang, Zhenglin Du, Jinyue Wang, Xin Sheng, Jiayan Wu, Jingfa Xiao","doi":"10.1159/000489758","DOIUrl":null,"url":null,"abstract":"<p><p>Genotype imputation is a process of estimating missing ge-notypes from the haplotype or genotype reference panel. It can effectively boost the power of detecting single nucleotide polymorphisms (SNPs) in genome-wide association studies, integrate multi-studies for meta-analysis, and be applied in fine-mapping studies. The performance of genotype imputation is affected by many factors, including software, reference selection, sample size, and SNP density/sequencing coverage. A systematical evaluation of the imputation performance of current popular software will benefit future studies. Here, we evaluate imputation performances of Beagle4.1, IMPUTE2, MACH+Minimac3, and SHAPEIT2+ IM-PUTE2 using test samples of East Asian ancestry and references of the 1000 Genomes Project. The result indicated the accuracy of IMPUTE2 (99.18%) is slightly higher than that of the others (Beagle4.1: 98.94%, MACH+Minimac3: 98.51%, and SHAPEIT2+IMPUTE2: 99.08%). To achieve good and stable imputation quality, the minimum requirement of SNP density needs to be > 200/Mb. The imputation accuracies of IMPUTE2 and Beagle4.1 were under the minor influence of the study sample size. The contribution extent of reference to genotype imputation performance relied on software selection. We assessed the imputation performance on SNPs generated by next-generation whole genome sequencing and found that SNP sets detected by sequencing with 15× depth could be mostly got by imputing from the haplotype reference panel of the 1000 Genomes Project based on SNP data detected by sequencing with 4× depth. All of the imputation software had a weaker performance in low minor allele frequency SNP regions because of the bias of reference or software. In the future, more comprehensive reference panels or new algorithm developments may rise up to this challenge.</p>","PeriodicalId":13226,"journal":{"name":"Human Heredity","volume":"83 3","pages":"107-116"},"PeriodicalIF":1.5000,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1159/000489758","citationCount":"44","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Human Heredity","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1159/000489758","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2019/1/22 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 44

Abstract

Genotype imputation is a process of estimating missing ge-notypes from the haplotype or genotype reference panel. It can effectively boost the power of detecting single nucleotide polymorphisms (SNPs) in genome-wide association studies, integrate multi-studies for meta-analysis, and be applied in fine-mapping studies. The performance of genotype imputation is affected by many factors, including software, reference selection, sample size, and SNP density/sequencing coverage. A systematical evaluation of the imputation performance of current popular software will benefit future studies. Here, we evaluate imputation performances of Beagle4.1, IMPUTE2, MACH+Minimac3, and SHAPEIT2+ IM-PUTE2 using test samples of East Asian ancestry and references of the 1000 Genomes Project. The result indicated the accuracy of IMPUTE2 (99.18%) is slightly higher than that of the others (Beagle4.1: 98.94%, MACH+Minimac3: 98.51%, and SHAPEIT2+IMPUTE2: 99.08%). To achieve good and stable imputation quality, the minimum requirement of SNP density needs to be > 200/Mb. The imputation accuracies of IMPUTE2 and Beagle4.1 were under the minor influence of the study sample size. The contribution extent of reference to genotype imputation performance relied on software selection. We assessed the imputation performance on SNPs generated by next-generation whole genome sequencing and found that SNP sets detected by sequencing with 15× depth could be mostly got by imputing from the haplotype reference panel of the 1000 Genomes Project based on SNP data detected by sequencing with 4× depth. All of the imputation software had a weaker performance in low minor allele frequency SNP regions because of the bias of reference or software. In the future, more comprehensive reference panels or new algorithm developments may rise up to this challenge.

查看原文本刊更多论文

基因型代入性能的综合评价。

基因型插入是从单倍型或基因型参考面板中估计缺失的基因型的过程。它可以有效地提高全基因组关联研究中单核苷酸多态性(snp)的检测能力，整合多项研究进行荟萃分析，并可应用于精细图谱研究。基因型插补的性能受到许多因素的影响，包括软件、参考文献选择、样本量和SNP密度/测序覆盖率。对当前流行软件的插补性能进行系统评价，将有利于今后的研究。在这里，我们利用东亚血统的测试样本和1000基因组计划的参考文献，评估了Beagle4.1、IMPUTE2、MACH+Minimac3和SHAPEIT2+ IM-PUTE2的代入性能。结果表明，IMPUTE2的准确率(99.18%)略高于其他几种方法(Beagle4.1: 98.94%， MACH+Minimac3: 98.51%， SHAPEIT2+IMPUTE2: 99.08%)。为了获得良好稳定的插入质量，SNP密度的最低要求需要> 200/Mb。IMPUTE2和Beagle4.1的归算精度受研究样本量的影响较小。参考文献对基因型插补性能的贡献程度依赖于软件选择。我们评估了下一代全基因组测序产生的SNP的代入性能，发现15倍深度测序检测到的SNP集大部分可以基于4倍深度测序检测到的SNP数据，从1000基因组计划的单倍型参考面板中代入。由于参考文献或软件的偏倚，所有软件在低次要等位基因频率SNP区域的表现都较弱。在未来，更全面的参考面板或新的算法的发展可能会上升到这一挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Human Heredity 生物-遗传学

CiteScore

2.50

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Gathering original research reports and short communications from all over the world, ''Human Heredity'' is devoted to methodological and applied research on the genetics of human populations, association and linkage analysis, genetic mechanisms of disease, and new methods for statistical genetics, for example, analysis of rare variants and results from next generation sequencing. The value of this information to many branches of medicine is shown by the number of citations the journal receives in fields ranging from immunology and hematology to epidemiology and public health planning, and the fact that at least 50% of all ''Human Heredity'' papers are still cited more than 8 years after publication (according to ISI Journal Citation Reports). Special issues on methodological topics (such as ‘Consanguinity and Genomics’ in 2014; ‘Analyzing Rare Variants in Complex Diseases’ in 2012) or reviews of advances in particular fields (‘Genetic Diversity in European Populations: Evolutionary Evidence and Medical Implications’ in 2014; ‘Genes and the Environment in Obesity’ in 2013) are published every year. Renowned experts in the field are invited to contribute to these special issues.