A cautionary tale of low-pass sequencing and imputation with respect to haplotype accuracy

IF 3.6 1区 农林科学 Q1 AGRICULTURE, DAIRY & ANIMAL SCIENCE
David Wragg, Wengang Zhang, Sarah Peterson, Murthy Yerramilli, Richard Mellanby, Jeffrey J. Schoenebeck, Dylan N. Clements
{"title":"A cautionary tale of low-pass sequencing and imputation with respect to haplotype accuracy","authors":"David Wragg, Wengang Zhang, Sarah Peterson, Murthy Yerramilli, Richard Mellanby, Jeffrey J. Schoenebeck, Dylan N. Clements","doi":"10.1186/s12711-024-00875-w","DOIUrl":null,"url":null,"abstract":"Low-pass whole-genome sequencing and imputation offer significant cost savings, enabling substantial increases in sample size and statistical power. This approach is particularly promising in livestock breeding, providing an affordable means of screening individuals for deleterious alleles or calculating genomic breeding values. Consequently, it may also be of value in companion animal genomics to support pedigree breeding. We sought to evaluate in dogs the impact of low coverage sequencing and reference-guided imputation on genotype concordance and association analyses. DNA isolated from saliva of 30 Labrador retrievers was sequenced at low (0.9X and 3.8X) and high (43.5X) coverage, and down-sampled from 43.5X to 9.6X and 17.4X. Genotype imputation was performed using a diverse reference panel (1021 dogs), and two subsets of the former panel (256 dogs each) where one had an excess of Labrador retrievers relative to other breeds. We observed little difference in imputed genotype concordance between reference panels. Association analyses for a locus acting as a disease proxy were performed using single-marker (GEMMA) and haplotype-based (XP-EHH) tests. GEMMA results were highly correlated (r ≥ 0.97) between 43.5X and ≥ 3.8X depths of coverage, while for 0.9X the correlation was lower (r ≤ 0.8). XP-EHH results were less well correlated, with r ranging from 0.58 (0.9X) to 0.88 (17.4X). Across a random sample of 10,000 genomic regions averaging 17 kb in size, we observed a median of three haplotypes per dog across the sequencing depths, with 5% of the regions returning more than eight haplotypes. Inspection of one such region revealed genotype and phasing inconsistencies across sequencing depths. We demonstrate that saliva-derived canine DNA is suitable for whole-genome sequencing, highlighting the feasibility of client-based sampling. Low-pass sequencing and imputation require caution as incorrect allele assignments result when the subject possesses alleles that are absent in the reference panel. Larger panels have the capacity for greater allelic diversity, which should reduce the potential for imputation error. Although low-pass sequencing can accurately impute allele dosage, we highlight issues with phasing accuracy that impact haplotype-based analyses. Consequently, if accurately phased genotypes are required for analyses, we advocate sequencing at high depth (> 20X).","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":null,"pages":null},"PeriodicalIF":3.6000,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics Selection Evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12711-024-00875-w","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Low-pass whole-genome sequencing and imputation offer significant cost savings, enabling substantial increases in sample size and statistical power. This approach is particularly promising in livestock breeding, providing an affordable means of screening individuals for deleterious alleles or calculating genomic breeding values. Consequently, it may also be of value in companion animal genomics to support pedigree breeding. We sought to evaluate in dogs the impact of low coverage sequencing and reference-guided imputation on genotype concordance and association analyses. DNA isolated from saliva of 30 Labrador retrievers was sequenced at low (0.9X and 3.8X) and high (43.5X) coverage, and down-sampled from 43.5X to 9.6X and 17.4X. Genotype imputation was performed using a diverse reference panel (1021 dogs), and two subsets of the former panel (256 dogs each) where one had an excess of Labrador retrievers relative to other breeds. We observed little difference in imputed genotype concordance between reference panels. Association analyses for a locus acting as a disease proxy were performed using single-marker (GEMMA) and haplotype-based (XP-EHH) tests. GEMMA results were highly correlated (r ≥ 0.97) between 43.5X and ≥ 3.8X depths of coverage, while for 0.9X the correlation was lower (r ≤ 0.8). XP-EHH results were less well correlated, with r ranging from 0.58 (0.9X) to 0.88 (17.4X). Across a random sample of 10,000 genomic regions averaging 17 kb in size, we observed a median of three haplotypes per dog across the sequencing depths, with 5% of the regions returning more than eight haplotypes. Inspection of one such region revealed genotype and phasing inconsistencies across sequencing depths. We demonstrate that saliva-derived canine DNA is suitable for whole-genome sequencing, highlighting the feasibility of client-based sampling. Low-pass sequencing and imputation require caution as incorrect allele assignments result when the subject possesses alleles that are absent in the reference panel. Larger panels have the capacity for greater allelic diversity, which should reduce the potential for imputation error. Although low-pass sequencing can accurately impute allele dosage, we highlight issues with phasing accuracy that impact haplotype-based analyses. Consequently, if accurately phased genotypes are required for analyses, we advocate sequencing at high depth (> 20X).
低通滤波测序和估算单倍型准确性的警示故事
低通滤波全基因组测序和估算可大大节省成本,使样本量和统计能力大幅提高。这种方法在家畜育种方面尤其具有前景,它为筛选个体的有害等位基因或计算基因组育种值提供了一种经济实惠的方法。因此,它在伴侣动物基因组学中也可能具有支持血统育种的价值。我们试图在狗身上评估低覆盖率测序和参考指导归因对基因型一致性和关联分析的影响。我们对从 30 只拉布拉多猎犬唾液中分离出来的 DNA 进行了低覆盖率(0.9 倍和 3.8 倍)和高覆盖率(43.5 倍)测序,并从 43.5 倍向下取样至 9.6 倍和 17.4 倍。基因型推算使用了一个多样化的参考样本(1021 只狗)和前一个样本的两个子集(各 256 只狗),其中一个子集中拉布拉多猎犬的数量多于其他品种。我们观察到,参照组之间的推算基因型一致性差别不大。我们使用单标记(GEMMA)和基于单倍型(XP-EHH)的检验对作为疾病代理的基因座进行了关联分析。在 43.5 倍和≥ 3.8 倍的覆盖深度之间,GEMMA 结果高度相关(r ≥ 0.97),而在 0.9 倍的覆盖深度之间,相关性较低(r ≤ 0.8)。XP-EHH 结果的相关性较低,r 值从 0.58(0.9X)到 0.88(17.4X)不等。在平均大小为 17 kb 的 10,000 个基因组区域的随机样本中,我们观察到每只狗在不同测序深度下的单倍型中位数为 3 个,其中 5%的区域有 8 个以上的单倍型。对其中一个区域的检查发现,不同测序深度的基因型和相位不一致。我们证明了从唾液中提取的犬 DNA 适用于全基因组测序,突出了基于客户采样的可行性。低通测序和估算需要谨慎,因为当受试者拥有的等位基因在参照组中不存在时,就会导致不正确的等位基因分配。较大的参照组有能力获得更多的等位基因多样性,这应能减少等位基因归因错误的可能性。虽然低通测序能准确估算等位基因剂量,但我们强调了相位准确性的问题,这影响了基于单倍型的分析。因此,如果分析需要精确分期的基因型,我们主张进行高深度测序(> 20 倍)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Genetics Selection Evolution
Genetics Selection Evolution 生物-奶制品与动物科学
CiteScore
6.50
自引率
9.80%
发文量
74
审稿时长
1 months
期刊介绍: Genetics Selection Evolution invites basic, applied and methodological content that will aid the current understanding and the utilization of genetic variability in domestic animal species. Although the focus is on domestic animal species, research on other species is invited if it contributes to the understanding of the use of genetic variability in domestic animals. Genetics Selection Evolution publishes results from all levels of study, from the gene to the quantitative trait, from the individual to the population, the breed or the species. Contributions concerning both the biological approach, from molecular genetics to quantitative genetics, as well as the mathematical approach, from population genetics to statistics, are welcome. Specific areas of interest include but are not limited to: gene and QTL identification, mapping and characterization, analysis of new phenotypes, high-throughput SNP data analysis, functional genomics, cytogenetics, genetic diversity of populations and breeds, genetic evaluation, applied and experimental selection, genomic selection, selection efficiency, and statistical methodology for the genetic analysis of phenotypes with quantitative and mixed inheritance.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信