Ziyi Kang, Jie Kong, Qi Li, Juan Sui, Ping Dai, Kun Luo, Xianhong Meng, Baolong Chen, Jiawang Cao, Jian Tan, Qiang Fu, Qun Xing, Sheng Luan
{"title":"Genomic selection strategies to overcome genotype by environment interactions in biosecurity-based aquaculture breeding programs","authors":"Ziyi Kang, Jie Kong, Qi Li, Juan Sui, Ping Dai, Kun Luo, Xianhong Meng, Baolong Chen, Jiawang Cao, Jian Tan, Qiang Fu, Qun Xing, Sheng Luan","doi":"10.1186/s12711-025-00949-3","DOIUrl":"https://doi.org/10.1186/s12711-025-00949-3","url":null,"abstract":"Family-based selective breeding programs typically employ both between-family and within-family selection in aquaculture. However, these programs may exhibit a reduced genetic gain in the presence of a genotype by environment interactions (G × E) when employing biosecurity-based breeding schemes (BS), compared to non-biosecurity-based breeding schemes (NBS). Fortunately, genomic selection shows promise in improving genetic gain by taking within-family variance into account. Stochastic simulation was employed to evaluate genetic gain and G × E trends in BS for improving the body weight of L. vannamei, considering selective genotyping strategies for test group (TG) at a commercial farm environment (CE), the number individuals of the selection group (SG) genotyped at nucleus breeding center (NE), and varying levels of G × E. The loss of genetic gain in BS ranged from 9.4 to 38.9% in pedigree-based selection and was more pronounced when G × E was stronger, as quantified by a lower genetic correlation for body weight between NE and CE. Genomic selection, particularly with selective genotyping of TG individuals with extreme performance, effectively offset the loss of genetic gain. With a genetic correlation of 0.8, genotyping 20 SG individuals in each candidate family achieved 93.2% of the genetic gain observed for NBS. However, when the genetic correlation fell below 0.5, the number of genotyped SG individuals per family had to be increased to 50 or more. Genetic gain improved by on average 9.4% when the number of genotyped SG individuals rose from 20 to 50, but the increase in genetic gain averaged only 2.4% when expanding from 50 to 80 individuals genotyped. In addition, the genetic correlation decreased by on average 0.13 over 30 generations of selection when performing BS and the genetic correlation fluctuated across generations. Genomic selection can effectively compensate for the loss of genetic gain in BS due to G × E. However, the number of genotyped SG individuals and the level of G × E significantly affected the extra genetic gain from genomic selection. A family-based BS selective breeding program should monitor the level of G × E and genotyping 50 SG individuals per candidate family to minimize the loss of genetic gain due to G × E, unless the level of G × E is confirmed to be low.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"17 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143020654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Simona Antonios, Silvia T. Rodríguez-Ramilo, Andres Legarra, Jean-Michel Astruc, Luis Varona, Zulma G. Vitezica
{"title":"Genetic inbreeding load and its individual prediction for milk yield in French dairy sheep","authors":"Simona Antonios, Silvia T. Rodríguez-Ramilo, Andres Legarra, Jean-Michel Astruc, Luis Varona, Zulma G. Vitezica","doi":"10.1186/s12711-024-00945-z","DOIUrl":"https://doi.org/10.1186/s12711-024-00945-z","url":null,"abstract":"The magnitude of inbreeding depression depends on the recessive burden of the individual, which can be traced back to the hidden (recessive) inbreeding load among ancestors. However, these ancestors carry different alleles at potentially deleterious loci and therefore there is individual variability of this inbreeding load. Estimation of the additive genetic value for inbreeding load is possible using a decomposition of inbreeding in partial inbreeding components due to ancestors. Both the magnitude of variation in partial inbreeding components and the additive genetic variance of inbreeding loads are largely unknown. Our study had three objectives. First, based on substitution effect under non-random matings, we showed analytically that inbreeding load of an ancestor can be expressed as an additive genetic effect. Second, we analysed the structure of individual inbreeding by examining the contributions of specific ancestors/founders using the concept of partial inbreeding coefficients in three French dairy sheep populations (Basco-Béarnaise, Manech Tête Noire and Manech Tête Rousse). Third, we included these coefficients in a mixed model as random regression covariates, to predict genetic variance and breeding values of the inbreeding load for milk yield in the same breeds. Pedigrees included 190,276, 166,028 and 633,655 animals of Basco-Béarnaise, Manech Tête Noire and Manech Tête Rousse, respectively, born between 1985 and 2021. A fraction of 99.1% of the partial inbreeding coefficients were lower than 0.01 in all breeds, meaning that in practice inbreeding occurs in pedigree loops that span several generations backwards. Less than 5% ancestors generate inbreeding, because mating is essentially between unrelated individuals. Inbreeding load estimations involved 658,731, 541,180 and 2,168,454 records of yearly milk yield from 178,123, 151,863 and 596,586 females in Basco-Béarnaise, Manech Tête Noire and Manech Tête Rousse, respectively. Adding the inbreeding load effect to the model improved the fitting (values of the statistic Likelihood Ratio Test between 132 and 383) for milk yield in the three breeds. The inbreeding load variances were equal to 11,804 and 9435 L squared of milk yield for a fully inbred (100%) descendant in Manech Tête Noire and Manech Tête Rousse. In Basco-Béarnaise, the estimate of the inbreeding load variance (11,804) was not significantly different from zero. The correlations between (direct effect) additive genetic and inbreeding load effects were − 0.09, − 0.08 and − 0.12 in Basco-Béarnaise, Manech Tête Noire and Manech Tête Rousse. The decomposition of inbreeding in partial coefficients in these populations shows that inbreeding is mostly due to several small contributions of ancestors (lower than 0.001) going back several generations (5 to 7 generations), which is according to the policy of avoiding close matings. There is variation of inbreeding load among animals, although its magnitude does not seem enough to warr","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"50 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142968270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sex identification in rainbow trout using genomic information and machine learning","authors":"Andrei A. Kudinov, Antti Kause","doi":"10.1186/s12711-024-00944-0","DOIUrl":"https://doi.org/10.1186/s12711-024-00944-0","url":null,"abstract":"Sex identification in farmed fish is important for the management of fish stocks and breeding programs, but identification based on visual characteristics is typically difficult or impossible in juvenile or premature fish. The amount of genomic data obtained from farmed fish is rapidly growing with the implementation of genomic selection in aquaculture. In comparison to mammals and birds, ray-finned fishes exhibit a greater diversity of sex determination systems, with an absence of conserved genomic regions. A group of genomic markers located on a standard genotyping array has been reported to potentially be linked with sex determination in rainbow trout. However, the set of markers suitable for sex identification may vary between populations. Sex identification from genomic data is usually performed using probabilistic methods, where suitable markers are known beforehand. In our study, we demonstrated the use of the Extreme Gradient Boosting approach from the supervised machine learning gradient boost framework to predict sex from unimputed genomic data, when the suitability of the markers was unknown a priori. The accuracy of the method was assessed using four simulated datasets with different genotyping error rates and one real dataset from the Finnish Rainbow Trout Breeding Program. The method showed high prediction quality on both simulated and real datasets. For simulated datasets with low (5%) and high (50%) genotyping error rates, the accuracies were 1.0 and 0.60, respectively. In the real data, the method achieved a prediction accuracy of 98%, which is suitable for routine use.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"4 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142901813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Haplotype analysis incorporating ancestral origins identified novel genetic loci associated with chicken body weight using an advanced intercross line","authors":"Lina Bu, Yuzhe Wang, Lizhi Tan, Zilong Wen, Xiaoxiang Hu, Zhiwu Zhang, Yiqiang Zhao","doi":"10.1186/s12711-024-00946-y","DOIUrl":"https://doi.org/10.1186/s12711-024-00946-y","url":null,"abstract":"The genome-wide association study (GWAS) is a powerful method for mapping quantitative trait loci (QTL). However, standard GWAS can detect only QTL that segregate in the mapping population. Crossing populations with different characteristics increases genetic variability but F2 or back-crosses lack mapping resolution due to the limited number of recombination events. This drawback can be overcome with advanced intercross line (AIL) populations, which increase the number recombination events and provide a more accurate mapping resolution. Recent studies in humans have revealed ancestry-dependent genetic architecture and shown the effectiveness of admixture mapping in admixed populations. Through the incorporation of line-of-origin effects and GWAS on an F9 AIL population, we identified genes that affect body weight at eight weeks of age (BW8) in chickens. The proposed ancestral-haplotype-based GWAS (testing only the origin regardless of the alleles) revealed three new QTLs on GGA12, GGA15, and GGA20. By using the concepts of ancestral homozygotes (individuals that carry two haplotypes of the same origin) and ancestral heterozygotes (carrying one haplotype of each origin), we identified 632 loci that exhibited high-parent (the heterozygote is better than both parents) and mid-parent (the heterozygote is better than the median of the parents) dominance across 12 chromosomes. Out of the 199 genes associated with BW8, EYA1, PDE1C, and MYC were identified as the best candidate genes for further validation. In addition to the candidate genes reported in this study, our research demonstrates the effectiveness of incorporating ancestral information in population genetic analyses, which can be broadly applicable for genetic mapping in populations generated by ancestors with distinct phenotypes and genetic backgrounds. Our methods can benefit both geneticists and biologists interested in the genetic determinism of complex traits.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"64 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142858434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jón H. Eiríksson, Þórdís Þórarinsdóttir, Egill Gautason
{"title":"Predicted breeding values for relative scrapie susceptibility for genotyped and ungenotyped sheep","authors":"Jón H. Eiríksson, Þórdís Þórarinsdóttir, Egill Gautason","doi":"10.1186/s12711-024-00947-x","DOIUrl":"https://doi.org/10.1186/s12711-024-00947-x","url":null,"abstract":"Scrapie is an infectious prion disease in sheep. Selective breeding for resistant genotypes of the prion protein gene (PRNP) is an effective way to prevent scrapie outbreaks. Genotyping all selection candidates in a population is expensive but existing pedigree records can help infer the probabilities of genotypes in relatives of genotyped animals. We used linear models to predict allele content for the various PRNP alleles found in Icelandic sheep and compiled the available estimates of relative scrapie susceptibility (RSS) associated with PRNP genotypes from the literature. Using the predicted allele content and the genotypic RSS we calculated estimated breeding values (EBV) for RSS. We tested the predictions on simulated data under different scenarios that varied in the proportion of genotyped sheep, genotyping strategy, pedigree recording accuracy, genotyping error rates and assumed heritability of allele content. Prediction of allele content for rare alleles was less successful than for alleles with moderate frequencies. The accuracy of allele content and RSS EBV predictions was not affected by the assumed heritability, but the dispersion of prediction was affected. In a scenario where 40% of rams were genotyped and no errors in genotyping or recorded pedigree, the accuracy of RSS EBV for ungenotyped selection candidates was 0.49. If only 20% of rams were genotyped, or rams and ewes were genotyped randomly, or there were 10% pedigree errors, or there were 2% genotyping errors, the accuracy decreased by 0.07, 0.08, 0.03 and 0.04, respectively. With empirical data, the accuracy of RSS EBV for ungenotyped sheep was 0.46–0.65. A linear model for predicting allele content for the PRNP gene, combined with estimates of relative susceptibility associated with PRNP genotypes, can provide RSS EBV for scrapie resistance for ungenotyped selection candidates with accuracy up to 0.65. These RSS EBV can complement selection strategies based on PRNP genotypes, especially in populations where resistant genotypes are rare.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"54 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142849415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yvonne C. J. Wientjes, Katrijn Peeters, Piter Bijma, Abe E. Huisman, Mario P. L. Calus
{"title":"Changes in allele frequencies and genetic architecture due to selection in two pig populations","authors":"Yvonne C. J. Wientjes, Katrijn Peeters, Piter Bijma, Abe E. Huisman, Mario P. L. Calus","doi":"10.1186/s12711-024-00941-3","DOIUrl":"https://doi.org/10.1186/s12711-024-00941-3","url":null,"abstract":"Genetic selection improves a population by increasing the frequency of favorable alleles. Understanding and monitoring allele frequency changes is, therefore, important to obtain more insight into the long-term effects of selection. This study aimed to investigate changes in allele frequencies and in results of genome-wide association studies (GWAS), and how those two are related to each other. This was studied in two maternal pig lines where selection was based on a broad selection index. Genotypes and phenotypes were available from 2015 to 2021. Several large changes in allele frequencies over the years were observed in both lines. The largest allele frequency changes were not larger than expected under drift based on gene dropping simulations, but the average allele frequency change was larger with selection. Moreover, several significant regions were found in the GWAS for the traits under selection, but those regions did not overlap with regions with larger allele frequency changes. No significant GWAS regions were found for the selection index in both lines, which included multiple traits, indicating that the index is affected by many loci of small effect. Additionally, many significant regions showed pleiotropic, and often antagonistic, associations with other traits under selection. This reduces the selection pressure on those regions, which can explain why those regions are still segregating, although the traits have been under selection for several generations. Across the years, only small changes in Manhattan plots were found, indicating that the genetic architecture was reasonably constant. No significant GWAS regions were found for any of the traits under selection among the regions with the largest changes in allele frequency, and the correlation between significance level of marker associations and changes in allele frequency over one generation was close to zero for all traits. Moreover, the largest changes in allele frequency could be explained by drift and were not necessarily a result of selection. This is probably because selection acted on a broad index for which no significant GWAS regions were found. Our results show that selecting on a broad index spreads the selection pressure across the genome, thereby limiting allele frequency changes.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"26 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142832236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christian Stricker, Rohan L. Fernando, Albrecht Melchinger, Hans-Juergen Auinger, Chris-Carolin Schoen
{"title":"On the inverse association between the number of QTL and the trait-specific genomic relationship of a candidate to the training set.","authors":"Christian Stricker, Rohan L. Fernando, Albrecht Melchinger, Hans-Juergen Auinger, Chris-Carolin Schoen","doi":"10.1186/s12711-024-00940-4","DOIUrl":"https://doi.org/10.1186/s12711-024-00940-4","url":null,"abstract":"Accuracy of genomic prediction depends on the heritability of the trait, the size of the training set, the relationship of the candidates to the training set, and the $$text {Min}(N_{text {QTL}},M_e)$$ , where $$N_{text {QTL}}$$ is the number of QTL and $$M_e$$ is the number of independently segregating chromosomal segments. Due to LD, the number $$Q_e$$ of independently segregating QTL (effective QTL) can be lower than $$text {Min}(N_{text {QTL}},M_e)$$ . In this paper, we show that $$Q_e$$ is inversely associated with the trait-specific genomic relationship of a candidate to the training set. This provides an explanation for the inverse association between $$Q_e$$ and the accuracy of prediction. To quantify the genomic relationship of a candidate to all members of the training set, we considered the $$k^2$$ statistic that has been previously used for this purpose. It quantifies how well the marker covariate vector of a candidate can be represented as a linear combination of the rows of the marker covariate matrix of the training set. In this paper, we used Bayesian regression to make this statistic trait specific and argue that the trait-specific genomic relationship of a candidate to the training set is inversely associated with $$Q_e$$ . Simulation was used to demonstrate the dependence of the trait-specific $$k^2$$ statistic on $$Q_e$$ , which is related to $$N_{text {QTL}}$$ . The posterior distributions of the trait-specific $$k^2$$ statistic showed that the trait-specific genomic relationship between a candidate and the training set is inversely associated to $$Q_e$$ and $$N_{text {QTL}}$$ . Further, we show that trait-specific genomic relationship between a candidate and the training set is directly related to the size of the training set.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"73 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142815891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ismo Strandén, Esa A. Mäntysaari, Martin H. Lidauer, Robin Thompson, Hongding Gao
{"title":"A computationally efficient algorithm to leverage average information REML for (co)variance component estimation in the genomic era","authors":"Ismo Strandén, Esa A. Mäntysaari, Martin H. Lidauer, Robin Thompson, Hongding Gao","doi":"10.1186/s12711-024-00939-x","DOIUrl":"https://doi.org/10.1186/s12711-024-00939-x","url":null,"abstract":"Methods for estimating variance components (VC) using restricted maximum likelihood (REML) typically require elements from the inverse of the coefficient matrix of the mixed model equations (MME). As genomic information becomes more prevalent, the coefficient matrix of the MME becomes denser, presenting a challenge for analyzing large datasets. Thus, computational algorithms based on iterative solving and Monte Carlo approximation of the inverse of the coefficient matrix become appealing. While the standard average information REML (AI-REML) is known for its rapid convergence, its computational intensity imposes limitations. In particular, the standard AI-REML requires solving the MME for each VC, which can be computationally demanding, especially when dealing with complex models with many VC. To bridge this gap, here we (1) present a computationally efficient and tractable algorithm, named the augmented AI-REML, which facilitates the AI-REML by solving an augmented MME only once within each REML iteration; and (2) implement this approach for VC estimation in a general framework of a multi-trait GBLUP model. VC estimation was investigated based on the number of VC in the model, including a two-trait, three-trait, four-trait, and five-trait GBLUP model. We compared the augmented AI-REML with the standard AI-REML in terms of computing time per REML iteration. Direct and iterative solving methods were used to assess the advances of the augmented AI-REML. When using the direct solving method, the augmented AI-REML and the standard AI-REML required similar computing times for models with a small number of VC (the two- and three-trait GBLUP model), while the augmented AI-REML demonstrated more notable reductions in computing time as the number of VC in the model increased. When using the iterative solving method, the augmented AI-REML demonstrated substantial improvements in computational efficiency compared to the standard AI-REML. The elapsed time of each REML iteration was reduced by 75%, 84%, and 86% for the two-, three-, and four-trait GBLUP models, respectively. The augmented AI-REML can considerably reduce the computing time within each REML iteration, particularly when using an iterative solver. Our results demonstrate the potential of the augmented AI-REML as an appealing approach for large-scale VC estimation in the genomic era.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"11 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142678314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alan M. Pardo, Andres Legarra, Zulma G. Vitezica, Natalia S. Forneris, Daniel O. Maizon, Sebastián Munilla
{"title":"On the ability of the LR method to detect bias when there is pedigree misspecification and lack of connectedness","authors":"Alan M. Pardo, Andres Legarra, Zulma G. Vitezica, Natalia S. Forneris, Daniel O. Maizon, Sebastián Munilla","doi":"10.1186/s12711-024-00943-1","DOIUrl":"https://doi.org/10.1186/s12711-024-00943-1","url":null,"abstract":"Cross-validation techniques in genetic evaluations encounter limitations due to the unobservable nature of breeding values and the challenge of validating estimated breeding values (EBVs) against pre-corrected phenotypes, challenges which the Linear Regression (LR) method addresses as an alternative. Furthermore, beef cattle genetic evaluation programs confront challenges with connectedness among herds and pedigree errors. The objective of this work was to evaluate the LR method's performance under pedigree errors and weak connectedness typical in beef cattle genetic evaluations, through simulation. We simulated a beef cattle population resembling the Argentinean Brangus, including a quantitative trait selected over six pseudo-generations with a heritability of 0.4. This study considered various scenarios, including: 25% and 40% pedigree errors (PE-25 and PE-40), weak and strong connectedness among herds (WCO and SCO, respectively), and a benchmark scenario (BEN) with complete pedigree and optimal herd connections. Over six pseudo-generations of selection, genetic gain was simulated to be under- and over-estimated in PE-40 and WCO, respectively, contrary to the BEN scenario which was unbiased. In genetic evaluations with PE-25 and PE-40, true biases of − 0.13 and − 0.18 genetic standard deviations were simulated, respectively. In the BEN scenario, the LR method accurately estimated bias, however, in PE-25 and PE-40 scenarios, it overestimated biases by 0.17 and 0.25 genetic standard deviations, respectively. In herds facing WCO, significant true bias due to confounding environmental and genetic effects was simulated, and the corresponding LR statistic failed to accurately estimate the magnitude and direction of this bias. On average, true dispersion values were close to one for BEN, PE-40, SCO and WCO, showing no significant inflation or deflation, and the values were accurately estimated by LR. However, PE-25 exhibited inflation of EBVs and was slightly underestimated by LR. Accuracies and reliabilities showed good agreement between true and LR estimated values for the scenarios evaluated. The LR method demonstrated limitations in identifying biases induced by incomplete pedigrees, including scenarios with as much as 40% pedigree errors, or lack of connectedness, but it was effective in assessing dispersion, and population accuracies and reliabilities even in the challenging scenarios addressed.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"14 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142678315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tuan V. Nguyen, Sunduimijid Bolormaa, Coralie M. Reich, Amanda J. Chamberlain, Christy J. Vander Jagt, Hans D. Daetwyler, Iona M. MacLeod
{"title":"Empirical versus estimated accuracy of imputation: optimising filtering thresholds for sequence imputation","authors":"Tuan V. Nguyen, Sunduimijid Bolormaa, Coralie M. Reich, Amanda J. Chamberlain, Christy J. Vander Jagt, Hans D. Daetwyler, Iona M. MacLeod","doi":"10.1186/s12711-024-00942-2","DOIUrl":"https://doi.org/10.1186/s12711-024-00942-2","url":null,"abstract":"Genotype imputation is a cost-effective method for obtaining sequence genotypes for downstream analyses such as genome-wide association studies (GWAS). However, low imputation accuracy can increase the risk of false positives, so it is important to pre-filter data or at least assess the potential limitations due to imputation accuracy. In this study, we benchmarked three different imputation programs (Beagle 5.2, Minimac4 and IMPUTE5) and compared the empirical accuracy of imputation with the software estimated accuracy of imputation (Rsqsoft). We also tested the accuracy of imputation in cattle for autosomal and X chromosomes, SNP and INDEL, when imputing from either low-density or high-density genotypes. The accuracy of imputing sequence variants from real high-density genotypes was higher than from low-density genotypes. In our software benchmark, all programs performed well with only minor differences in accuracy. While there was a close relationship between empirical imputation accuracy and the imputation Rsqsoft, this differed considerably for Minimac4 compared to Beagle 5.2 and IMPUTE5. We found that the Rsqsoft threshold for removing poorly imputed variants must be customised according to the software and this should be accounted for when merging data from multiple studies, such as in meta-GWAS studies. We also found that imposing an Rsqsoft filter has a positive impact on genomic regions with poor imputation accuracy due to large segmental duplications that are susceptible to error-prone alignment. Overall, our results showed that on average the imputation accuracy for INDEL was approximately 6% lower than SNP for all software programs. Importantly, the imputation accuracy for the non-PAR (non-Pseudo-Autosomal Region) of the X chromosome was comparable to autosomal imputation accuracy, while for the PAR it was substantially lower, particularly when starting from low-density genotypes. This study provides an empirically derived approach to apply customised software-specific Rsqsoft thresholds for downstream analyses of imputed variants, such as needed for a meta-GWAS. The very poor empirical imputation accuracy for variants on the PAR when starting from low density genotypes demonstrates that this region should be imputed starting from a higher density of real genotypes.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"29 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142637636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}