Manu Kumar Gundappa, Diego Robledo, Alastair Hamilton, Ross D. Houston, James G. D. Prendergast, Daniel J. Macqueen
{"title":"High performance imputation of structural and single nucleotide variants using low-coverage whole genome sequencing","authors":"Manu Kumar Gundappa, Diego Robledo, Alastair Hamilton, Ross D. Houston, James G. D. Prendergast, Daniel J. Macqueen","doi":"10.1186/s12711-025-00962-6","DOIUrl":null,"url":null,"abstract":"Whole genome sequencing (WGS), despite its advantages, is yet to replace methods for genotyping single nucleotide variants (SNVs) such as SNP arrays and targeted genotyping assays. Structural variants (SVs) have larger effects on traits than SNVs, but are more challenging to accurately genotype. Using low-coverage WGS with genotype imputation offers a cost-effective strategy to achieve genome-wide variant coverage, but is yet to be tested for SVs. Here, we investigate combined SNV and SV imputation with low-coverage WGS data in Atlantic salmon (Salmo salar). As the reference panel, we used genotypes for high-confidence SVs and SNVs for n = 365 wild individuals sampled from diverse populations. We also generated 15 × WGS data (n = 20 samples) for a commercial population external to the reference panel, and called SVs and SNVs with gold-standard approaches. An imputation method selected for its established performance using low-coverage sequencing data (GLIMPSE) was tested at WGS depths of 1 × , 2 × , 3 × , and 4 × for samples within and external to the reference panel. SNVs were imputed with high accuracy and recall across all WGS depths, including for samples out-with the reference panel. For SVs, we compared imputation based purely on linkage disequilibrium (LD) with SNVs, to that supplemented with SV genotype likelihoods (GLs) from low-coverage WGS. Including SV GLs increased imputation accuracy, but as a trade-off with recall, requiring 3–4 × depth for best performance. Combining strategies allowed us to capture 84% of the reference panel deletions with 87% accuracy at 1 × depth. We also show that SV length affects imputation performance, with provision of SV GLs greatly enhancing accuracy for the longest SVs in the dataset. This study highlights the promise of reference panel imputation using low-coverage WGS, including novel opportunities to enhance the resolution of genome-wide association studies by capturing SVs.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"57 1","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics Selection Evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12711-025-00962-6","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Whole genome sequencing (WGS), despite its advantages, is yet to replace methods for genotyping single nucleotide variants (SNVs) such as SNP arrays and targeted genotyping assays. Structural variants (SVs) have larger effects on traits than SNVs, but are more challenging to accurately genotype. Using low-coverage WGS with genotype imputation offers a cost-effective strategy to achieve genome-wide variant coverage, but is yet to be tested for SVs. Here, we investigate combined SNV and SV imputation with low-coverage WGS data in Atlantic salmon (Salmo salar). As the reference panel, we used genotypes for high-confidence SVs and SNVs for n = 365 wild individuals sampled from diverse populations. We also generated 15 × WGS data (n = 20 samples) for a commercial population external to the reference panel, and called SVs and SNVs with gold-standard approaches. An imputation method selected for its established performance using low-coverage sequencing data (GLIMPSE) was tested at WGS depths of 1 × , 2 × , 3 × , and 4 × for samples within and external to the reference panel. SNVs were imputed with high accuracy and recall across all WGS depths, including for samples out-with the reference panel. For SVs, we compared imputation based purely on linkage disequilibrium (LD) with SNVs, to that supplemented with SV genotype likelihoods (GLs) from low-coverage WGS. Including SV GLs increased imputation accuracy, but as a trade-off with recall, requiring 3–4 × depth for best performance. Combining strategies allowed us to capture 84% of the reference panel deletions with 87% accuracy at 1 × depth. We also show that SV length affects imputation performance, with provision of SV GLs greatly enhancing accuracy for the longest SVs in the dataset. This study highlights the promise of reference panel imputation using low-coverage WGS, including novel opportunities to enhance the resolution of genome-wide association studies by capturing SVs.
期刊介绍:
Genetics Selection Evolution invites basic, applied and methodological content that will aid the current understanding and the utilization of genetic variability in domestic animal species. Although the focus is on domestic animal species, research on other species is invited if it contributes to the understanding of the use of genetic variability in domestic animals. Genetics Selection Evolution publishes results from all levels of study, from the gene to the quantitative trait, from the individual to the population, the breed or the species. Contributions concerning both the biological approach, from molecular genetics to quantitative genetics, as well as the mathematical approach, from population genetics to statistics, are welcome. Specific areas of interest include but are not limited to: gene and QTL identification, mapping and characterization, analysis of new phenotypes, high-throughput SNP data analysis, functional genomics, cytogenetics, genetic diversity of populations and breeds, genetic evaluation, applied and experimental selection, genomic selection, selection efficiency, and statistical methodology for the genetic analysis of phenotypes with quantitative and mixed inheritance.