Best practices for analyzing imputed genotypes from low-pass sequencing in dogs.

Mammalian genome : official journal of the International Mammalian Genome Society Pub Date : 2022-03-01 Epub Date: 2021-09-08 DOI:10.1007/s00335-021-09914-z

Reuben M Buckley, Alex C Harris, Guo-Dong Wang, D Thad Whitaker, Ya-Ping Zhang, Elaine A Ostrander

{"title":"Best practices for analyzing imputed genotypes from low-pass sequencing in dogs.","authors":"Reuben M Buckley, Alex C Harris, Guo-Dong Wang, D Thad Whitaker, Ya-Ping Zhang, Elaine A Ostrander","doi":"10.1007/s00335-021-09914-z","DOIUrl":null,"url":null,"abstract":"<p><p>Although DNA array-based approaches for genome-wide association studies (GWAS) permit the collection of thousands of low-cost genotypes, it is often at the expense of resolution and completeness, as SNP chip technologies are ultimately limited by SNPs chosen during array development. An alternative low-cost approach is low-pass whole genome sequencing (WGS) followed by imputation. Rather than relying on high levels of genotype confidence at a set of select loci, low-pass WGS and imputation rely on the combined information from millions of randomly sampled low-confidence genotypes. To investigate low-pass WGS and imputation in the dog, we assessed accuracy and performance by downsampling 97 high-coverage (> 15×) WGS datasets from 51 different breeds to approximately 1× coverage, simulating low-pass WGS. Using a reference panel of 676 dogs from 91 breeds, genotypes were imputed from the downsampled data and compared to a truth set of genotypes generated from high-coverage WGS. Using our truth set, we optimized a variant quality filtering strategy that retained approximately 80% of 14 M imputed sites and lowered the imputation error rate from 3.0% to 1.5%. Seven million sites remained with a MAF > 5% and an average imputation quality score of 0.95. Finally, we simulated the impact of imputation errors on outcomes for case-control GWAS, where small effect sizes were most impacted and medium-to-large effect sizes were minorly impacted. These analyses provide best practice guidelines for study design and data post-processing of low-pass WGS-imputed genotypes in dogs.</p>","PeriodicalId":412165,"journal":{"name":"Mammalian genome : official journal of the International Mammalian Genome Society","volume":" ","pages":"213-229"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8913487/pdf/","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mammalian genome : official journal of the International Mammalian Genome Society","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s00335-021-09914-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/9/8 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Although DNA array-based approaches for genome-wide association studies (GWAS) permit the collection of thousands of low-cost genotypes, it is often at the expense of resolution and completeness, as SNP chip technologies are ultimately limited by SNPs chosen during array development. An alternative low-cost approach is low-pass whole genome sequencing (WGS) followed by imputation. Rather than relying on high levels of genotype confidence at a set of select loci, low-pass WGS and imputation rely on the combined information from millions of randomly sampled low-confidence genotypes. To investigate low-pass WGS and imputation in the dog, we assessed accuracy and performance by downsampling 97 high-coverage (> 15×) WGS datasets from 51 different breeds to approximately 1× coverage, simulating low-pass WGS. Using a reference panel of 676 dogs from 91 breeds, genotypes were imputed from the downsampled data and compared to a truth set of genotypes generated from high-coverage WGS. Using our truth set, we optimized a variant quality filtering strategy that retained approximately 80% of 14 M imputed sites and lowered the imputation error rate from 3.0% to 1.5%. Seven million sites remained with a MAF > 5% and an average imputation quality score of 0.95. Finally, we simulated the impact of imputation errors on outcomes for case-control GWAS, where small effect sizes were most impacted and medium-to-large effect sizes were minorly impacted. These analyses provide best practice guidelines for study design and data post-processing of low-pass WGS-imputed genotypes in dogs.

查看原文本刊更多论文

从狗低通测序中分析输入基因型的最佳实践。

尽管基于DNA阵列的全基因组关联研究方法(GWAS)允许收集数千个低成本的基因型，但它往往以分辨率和完整性为代价，因为SNP芯片技术最终受到阵列开发过程中选择的SNP的限制。另一种低成本的方法是低通全基因组测序(WGS)，然后进行代入。低通WGS和归算依赖于数百万个随机抽样的低置信度基因型的综合信息，而不是依赖于一组选定基因座的高水平基因型置信度。为了研究低通WGS和狗的代入，我们将来自51个不同品种的97个高覆盖率(> 15倍)WGS数据集降采样至约1倍覆盖率，模拟低通WGS，以评估准确性和性能。使用来自91个品种的676只狗的参考面板，从下采样数据中输入基因型，并与高覆盖率WGS生成的基因型真实集进行比较。利用我们的真值集，我们优化了一种变体质量过滤策略，该策略保留了14m个输入位点的约80%，并将输入错误率从3.0%降低到1.5%。仍有700万个站点的MAF > 5%，平均imputation质量得分为0.95。最后，我们模拟了输入误差对病例对照GWAS结果的影响，其中小效应量受影响最大，中大型效应量受影响较小。这些分析为狗低通wgs基因型的研究设计和数据后处理提供了最佳实践指南。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Mammalian genome : official journal of the International Mammalian Genome Society

自引率

0.00%

发文量