Genotype imputation from low-coverage data for medical and population genetic analyses

IF 5.5 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research Pub Date : 2025-07-22 DOI:10.1101/gr.280175.124

Simone Andrea Biagini, Sara Becelaere, Mio Aerden, Tatjana Jatsenko, Laurens Hannes, Philip Van Damme, Jeroen Breckpot, Koenraad Devriendt, Bernard Thienpont, Joris Robert Vermeesch, Isabelle Cleynen, Toomas Kivisild

{"title":"Genotype imputation from low-coverage data for medical and population genetic analyses","authors":"Simone Andrea Biagini, Sara Becelaere, Mio Aerden, Tatjana Jatsenko, Laurens Hannes, Philip Van Damme, Jeroen Breckpot, Koenraad Devriendt, Bernard Thienpont, Joris Robert Vermeesch, Isabelle Cleynen, Toomas Kivisild","doi":"10.1101/gr.280175.124","DOIUrl":null,"url":null,"abstract":"Genotype imputation from low-pass sequencing data presents unique opportunities for genomic analyses but comes with specific challenges. In this study, we explore the impact of quality filters on genetic ancestry and Polygenic Score (PGS) estimation after imputing 32,769 low-pass genome wide sequences (LPS) from noninvasive prenatal screening (NIPS) with an average autosomal sequence depth of ~0.15×. In scenarios involving ultra-low coverage sequences, conventional approaches to enhance accuracy may fail, especially when multiple samples are pooled. To enhance the proportion of high-quality genotypes in large datasets we introduce a filtering approach called GDI that combines genotype probability (GP), alternate allele dosage (DS), and INFO score filters. We demonstrate that imputation tools QUILT and GLIMPSE2 achieve similar accuracy, which is high enough for broad-scale ancestry mapping but insufficient for high resolution Principal Component Analysis (PCA), when applied without filters. With the GDI approach we can achieve quality that is adequate for such purposes. Furthermore, we explored the impact of imputation errors, choice of variants and filtering methods on PGS prediction for height in 1,911 subjects with height data. We show that polygenic scores predict 23.7% of variance in height in our imputed data and that, contrary to the effect on PCA, the GDI filter does not improve the performance of PGS in height prediction. These results highlight that imputed LPS data can be leveraged for further biomedical and population genetic use but there is a need to consider each downstream analysis tool individually for its imputation quality thresholds and filtering requirements.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"16 1","pages":""},"PeriodicalIF":5.5000,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1101/gr.280175.124","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Genotype imputation from low-pass sequencing data presents unique opportunities for genomic analyses but comes with specific challenges. In this study, we explore the impact of quality filters on genetic ancestry and Polygenic Score (PGS) estimation after imputing 32,769 low-pass genome wide sequences (LPS) from noninvasive prenatal screening (NIPS) with an average autosomal sequence depth of ~0.15×. In scenarios involving ultra-low coverage sequences, conventional approaches to enhance accuracy may fail, especially when multiple samples are pooled. To enhance the proportion of high-quality genotypes in large datasets we introduce a filtering approach called GDI that combines genotype probability (GP), alternate allele dosage (DS), and INFO score filters. We demonstrate that imputation tools QUILT and GLIMPSE2 achieve similar accuracy, which is high enough for broad-scale ancestry mapping but insufficient for high resolution Principal Component Analysis (PCA), when applied without filters. With the GDI approach we can achieve quality that is adequate for such purposes. Furthermore, we explored the impact of imputation errors, choice of variants and filtering methods on PGS prediction for height in 1,911 subjects with height data. We show that polygenic scores predict 23.7% of variance in height in our imputed data and that, contrary to the effect on PCA, the GDI filter does not improve the performance of PGS in height prediction. These results highlight that imputed LPS data can be leveraged for further biomedical and population genetic use but there is a need to consider each downstream analysis tool individually for its imputation quality thresholds and filtering requirements.

查看原文本刊更多论文

基于低覆盖率数据的基因型输入，用于医学和群体遗传分析

低通测序数据的基因型插入为基因组分析提供了独特的机会，但也面临着特定的挑战。在这项研究中，我们在无创产前筛查（NIPS）中输入32,769个低通基因组宽序列（LPS），平均常染色体序列深度约为0.15×后，探讨了质量过滤器对遗传祖先和多基因评分（PGS）估计的影响。在涉及超低覆盖率序列的情况下，传统的提高准确性的方法可能会失败，特别是当多个样本合并时。为了提高高质量基因型在大型数据集中的比例，我们引入了一种称为GDI的过滤方法，该方法结合了基因型概率（GP）、替代等位基因剂量（DS）和INFO评分过滤器。我们证明，当不使用滤波器时，imputation工具QUILT和GLIMPSE2实现了类似的精度，这对于大尺度祖先映射足够高，但对于高分辨率主成分分析（PCA）来说还不够。使用GDI方法，我们可以达到满足这些目的的质量。在此基础上，利用1,911例受试者的身高数据，探讨了输入误差、变量选择和滤波方法对PGS预测身高的影响。我们发现，在我们的输入数据中，多基因得分预测了23.7%的身高方差，并且与PCA的效果相反，GDI滤波器并没有提高PGS在身高预测方面的性能。这些结果强调，输入的LPS数据可以用于进一步的生物医学和群体遗传学用途，但需要单独考虑每个下游分析工具的输入质量阈值和过滤要求。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Genome research 生物-生化与分子生物学

CiteScore

12.40

自引率

1.40%

发文量

140

审稿时长

6 months

期刊介绍： Launched in 1995, Genome Research is an international, continuously published, peer-reviewed journal that focuses on research that provides novel insights into the genome biology of all organisms, including advances in genomic medicine. Among the topics considered by the journal are genome structure and function, comparative genomics, molecular evolution, genome-scale quantitative and population genetics, proteomics, epigenomics, and systems biology. The journal also features exciting gene discoveries and reports of cutting-edge computational biology and high-throughput methodologies. New data in these areas are published as research papers, or methods and resource reports that provide novel information on technologies or tools that will be of interest to a broad readership. Complete data sets are presented electronically on the journal''s web site where appropriate. The journal also provides Reviews, Perspectives, and Insight/Outlook articles, which present commentary on the latest advances published both here and elsewhere, placing such progress in its broader biological context.