Hui Zhen Tan, Katarina C Stuart, Tram Vi, Annabel Whibley, Sarah Bailey, Patricia Brekke, Anna W Santure
{"title":"High Imputation Accuracy Can Be Achieved Using a Small Reference Panel in a Natural Population With Low Genetic Diversity.","authors":"Hui Zhen Tan, Katarina C Stuart, Tram Vi, Annabel Whibley, Sarah Bailey, Patricia Brekke, Anna W Santure","doi":"10.1111/1755-0998.70024","DOIUrl":null,"url":null,"abstract":"<p><p>Genotype imputation, the inference of missing genotypes using a reference set of population haplotypes, is a cost-effective tool for improving the quality and quantity of genetic datasets. Imputation is usually applied to large and well-characterised datasets of humans and livestock, even though it could also benefit smaller natural populations. This study aims to understand the best practices and effectiveness of imputation with a small reference panel for species with low genetic diversity, using a case study of a population of the hihi/stitchbird (Notiomystis cincta). We used a leave-one-out method to test imputation on 30 high-coverage hihi individuals where SNPs were masked before being imputed with Beagle v5.4. Imputation accuracy was measured using r<sup>2</sup>, the correlation between imputed and ground truth genotype dosages. We tested combinations of five imputation parameters, the inclusion of two linkage maps, reference panels of different sizes and compositions and targets of various SNP densities and sporadic missingness. We achieved mean r<sup>2</sup> exceeding 0.95 in most tests from a small reference panel of high-fecundity individuals. Imputation accuracy was not improved by including a linkage map and decreased at very low SNP densities. Imputed SNPs were filtered using r<sup>2</sup> to assess downstream heterozygosity calculations, the site frequency spectrum (SFS) and inference of runs of homozygosity (ROHs). We found that filtering and SNP density greatly affected heterozygosity and SFS at low SNP densities but that ROH inference was relatively robust to both. We provide a template for testing and optimising imputation in other wild populations.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":" ","pages":"e70024"},"PeriodicalIF":5.5000,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Ecology Resources","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1111/1755-0998.70024","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Genotype imputation, the inference of missing genotypes using a reference set of population haplotypes, is a cost-effective tool for improving the quality and quantity of genetic datasets. Imputation is usually applied to large and well-characterised datasets of humans and livestock, even though it could also benefit smaller natural populations. This study aims to understand the best practices and effectiveness of imputation with a small reference panel for species with low genetic diversity, using a case study of a population of the hihi/stitchbird (Notiomystis cincta). We used a leave-one-out method to test imputation on 30 high-coverage hihi individuals where SNPs were masked before being imputed with Beagle v5.4. Imputation accuracy was measured using r2, the correlation between imputed and ground truth genotype dosages. We tested combinations of five imputation parameters, the inclusion of two linkage maps, reference panels of different sizes and compositions and targets of various SNP densities and sporadic missingness. We achieved mean r2 exceeding 0.95 in most tests from a small reference panel of high-fecundity individuals. Imputation accuracy was not improved by including a linkage map and decreased at very low SNP densities. Imputed SNPs were filtered using r2 to assess downstream heterozygosity calculations, the site frequency spectrum (SFS) and inference of runs of homozygosity (ROHs). We found that filtering and SNP density greatly affected heterozygosity and SFS at low SNP densities but that ROH inference was relatively robust to both. We provide a template for testing and optimising imputation in other wild populations.
期刊介绍:
Molecular Ecology Resources promotes the creation of comprehensive resources for the scientific community, encompassing computer programs, statistical and molecular advancements, and a diverse array of molecular tools. Serving as a conduit for disseminating these resources, the journal targets a broad audience of researchers in the fields of evolution, ecology, and conservation. Articles in Molecular Ecology Resources are crafted to support investigations tackling significant questions within these disciplines.
In addition to original resource articles, Molecular Ecology Resources features Reviews, Opinions, and Comments relevant to the field. The journal also periodically releases Special Issues focusing on resource development within specific areas.