Simone Andrea Biagini, Sara Becelaere, Mio Aerden, Tatjana Jatsenko, Laurens Hannes, Philip Van Damme, Jeroen Breckpot, Koenraad Devriendt, Bernard Thienpont, Joris Robert Vermeesch, Isabelle Cleynen, Toomas Kivisild
{"title":"Genotype imputation from low-coverage data for medical and population genetic analyses","authors":"Simone Andrea Biagini, Sara Becelaere, Mio Aerden, Tatjana Jatsenko, Laurens Hannes, Philip Van Damme, Jeroen Breckpot, Koenraad Devriendt, Bernard Thienpont, Joris Robert Vermeesch, Isabelle Cleynen, Toomas Kivisild","doi":"10.1101/gr.280175.124","DOIUrl":null,"url":null,"abstract":"Genotype imputation from low-pass sequencing data presents unique opportunities for genomic analyses but comes with specific challenges. In this study, we explore the impact of quality filters on genetic ancestry and Polygenic Score (PGS) estimation after imputing 32,769 low-pass genome wide sequences (LPS) from noninvasive prenatal screening (NIPS) with an average autosomal sequence depth of ~0.15×. In scenarios involving ultra-low coverage sequences, conventional approaches to enhance accuracy may fail, especially when multiple samples are pooled. To enhance the proportion of high-quality genotypes in large datasets we introduce a filtering approach called GDI that combines genotype probability (GP), alternate allele dosage (DS), and INFO score filters. We demonstrate that imputation tools QUILT and GLIMPSE2 achieve similar accuracy, which is high enough for broad-scale ancestry mapping but insufficient for high resolution Principal Component Analysis (PCA), when applied without filters. With the GDI approach we can achieve quality that is adequate for such purposes. Furthermore, we explored the impact of imputation errors, choice of variants and filtering methods on PGS prediction for height in 1,911 subjects with height data. We show that polygenic scores predict 23.7% of variance in height in our imputed data and that, contrary to the effect on PCA, the GDI filter does not improve the performance of PGS in height prediction. These results highlight that imputed LPS data can be leveraged for further biomedical and population genetic use but there is a need to consider each downstream analysis tool individually for its imputation quality thresholds and filtering requirements.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"16 1","pages":""},"PeriodicalIF":5.5000,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1101/gr.280175.124","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Genotype imputation from low-pass sequencing data presents unique opportunities for genomic analyses but comes with specific challenges. In this study, we explore the impact of quality filters on genetic ancestry and Polygenic Score (PGS) estimation after imputing 32,769 low-pass genome wide sequences (LPS) from noninvasive prenatal screening (NIPS) with an average autosomal sequence depth of ~0.15×. In scenarios involving ultra-low coverage sequences, conventional approaches to enhance accuracy may fail, especially when multiple samples are pooled. To enhance the proportion of high-quality genotypes in large datasets we introduce a filtering approach called GDI that combines genotype probability (GP), alternate allele dosage (DS), and INFO score filters. We demonstrate that imputation tools QUILT and GLIMPSE2 achieve similar accuracy, which is high enough for broad-scale ancestry mapping but insufficient for high resolution Principal Component Analysis (PCA), when applied without filters. With the GDI approach we can achieve quality that is adequate for such purposes. Furthermore, we explored the impact of imputation errors, choice of variants and filtering methods on PGS prediction for height in 1,911 subjects with height data. We show that polygenic scores predict 23.7% of variance in height in our imputed data and that, contrary to the effect on PCA, the GDI filter does not improve the performance of PGS in height prediction. These results highlight that imputed LPS data can be leveraged for further biomedical and population genetic use but there is a need to consider each downstream analysis tool individually for its imputation quality thresholds and filtering requirements.
期刊介绍:
Launched in 1995, Genome Research is an international, continuously published, peer-reviewed journal that focuses on research that provides novel insights into the genome biology of all organisms, including advances in genomic medicine.
Among the topics considered by the journal are genome structure and function, comparative genomics, molecular evolution, genome-scale quantitative and population genetics, proteomics, epigenomics, and systems biology. The journal also features exciting gene discoveries and reports of cutting-edge computational biology and high-throughput methodologies.
New data in these areas are published as research papers, or methods and resource reports that provide novel information on technologies or tools that will be of interest to a broad readership. Complete data sets are presented electronically on the journal''s web site where appropriate. The journal also provides Reviews, Perspectives, and Insight/Outlook articles, which present commentary on the latest advances published both here and elsewhere, placing such progress in its broader biological context.