Best practices for analyzing imputed genotypes from low-pass sequencing in dogs.

Reuben M Buckley, Alex C Harris, Guo-Dong Wang, D Thad Whitaker, Ya-Ping Zhang, Elaine A Ostrander
{"title":"Best practices for analyzing imputed genotypes from low-pass sequencing in dogs.","authors":"Reuben M Buckley,&nbsp;Alex C Harris,&nbsp;Guo-Dong Wang,&nbsp;D Thad Whitaker,&nbsp;Ya-Ping Zhang,&nbsp;Elaine A Ostrander","doi":"10.1007/s00335-021-09914-z","DOIUrl":null,"url":null,"abstract":"<p><p>Although DNA array-based approaches for genome-wide association studies (GWAS) permit the collection of thousands of low-cost genotypes, it is often at the expense of resolution and completeness, as SNP chip technologies are ultimately limited by SNPs chosen during array development. An alternative low-cost approach is low-pass whole genome sequencing (WGS) followed by imputation. Rather than relying on high levels of genotype confidence at a set of select loci, low-pass WGS and imputation rely on the combined information from millions of randomly sampled low-confidence genotypes. To investigate low-pass WGS and imputation in the dog, we assessed accuracy and performance by downsampling 97 high-coverage (> 15×) WGS datasets from 51 different breeds to approximately 1× coverage, simulating low-pass WGS. Using a reference panel of 676 dogs from 91 breeds, genotypes were imputed from the downsampled data and compared to a truth set of genotypes generated from high-coverage WGS. Using our truth set, we optimized a variant quality filtering strategy that retained approximately 80% of 14 M imputed sites and lowered the imputation error rate from 3.0% to 1.5%. Seven million sites remained with a MAF > 5% and an average imputation quality score of 0.95. Finally, we simulated the impact of imputation errors on outcomes for case-control GWAS, where small effect sizes were most impacted and medium-to-large effect sizes were minorly impacted. These analyses provide best practice guidelines for study design and data post-processing of low-pass WGS-imputed genotypes in dogs.</p>","PeriodicalId":412165,"journal":{"name":"Mammalian genome : official journal of the International Mammalian Genome Society","volume":" ","pages":"213-229"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8913487/pdf/","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mammalian genome : official journal of the International Mammalian Genome Society","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s00335-021-09914-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/9/8 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Although DNA array-based approaches for genome-wide association studies (GWAS) permit the collection of thousands of low-cost genotypes, it is often at the expense of resolution and completeness, as SNP chip technologies are ultimately limited by SNPs chosen during array development. An alternative low-cost approach is low-pass whole genome sequencing (WGS) followed by imputation. Rather than relying on high levels of genotype confidence at a set of select loci, low-pass WGS and imputation rely on the combined information from millions of randomly sampled low-confidence genotypes. To investigate low-pass WGS and imputation in the dog, we assessed accuracy and performance by downsampling 97 high-coverage (> 15×) WGS datasets from 51 different breeds to approximately 1× coverage, simulating low-pass WGS. Using a reference panel of 676 dogs from 91 breeds, genotypes were imputed from the downsampled data and compared to a truth set of genotypes generated from high-coverage WGS. Using our truth set, we optimized a variant quality filtering strategy that retained approximately 80% of 14 M imputed sites and lowered the imputation error rate from 3.0% to 1.5%. Seven million sites remained with a MAF > 5% and an average imputation quality score of 0.95. Finally, we simulated the impact of imputation errors on outcomes for case-control GWAS, where small effect sizes were most impacted and medium-to-large effect sizes were minorly impacted. These analyses provide best practice guidelines for study design and data post-processing of low-pass WGS-imputed genotypes in dogs.

从狗低通测序中分析输入基因型的最佳实践。
尽管基于DNA阵列的全基因组关联研究方法(GWAS)允许收集数千个低成本的基因型,但它往往以分辨率和完整性为代价,因为SNP芯片技术最终受到阵列开发过程中选择的SNP的限制。另一种低成本的方法是低通全基因组测序(WGS),然后进行代入。低通WGS和归算依赖于数百万个随机抽样的低置信度基因型的综合信息,而不是依赖于一组选定基因座的高水平基因型置信度。为了研究低通WGS和狗的代入,我们将来自51个不同品种的97个高覆盖率(> 15倍)WGS数据集降采样至约1倍覆盖率,模拟低通WGS,以评估准确性和性能。使用来自91个品种的676只狗的参考面板,从下采样数据中输入基因型,并与高覆盖率WGS生成的基因型真实集进行比较。利用我们的真值集,我们优化了一种变体质量过滤策略,该策略保留了14m个输入位点的约80%,并将输入错误率从3.0%降低到1.5%。仍有700万个站点的MAF > 5%,平均imputation质量得分为0.95。最后,我们模拟了输入误差对病例对照GWAS结果的影响,其中小效应量受影响最大,中大型效应量受影响较小。这些分析为狗低通wgs基因型的研究设计和数据后处理提供了最佳实践指南。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信