Estimating genomic relationships of metafounders across and within breeds using maximum likelihood, pseudo-expectation–maximization maximum likelihood and increase of relationships
IF 3.6 1区 农林科学Q1 AGRICULTURE, DAIRY & ANIMAL SCIENCE
Andres Legarra, Matias Bermann, Quanshun Mei, Ole F. Christensen
{"title":"Estimating genomic relationships of metafounders across and within breeds using maximum likelihood, pseudo-expectation–maximization maximum likelihood and increase of relationships","authors":"Andres Legarra, Matias Bermann, Quanshun Mei, Ole F. Christensen","doi":"10.1186/s12711-024-00892-9","DOIUrl":null,"url":null,"abstract":"The theory of “metafounders” proposes a unified framework for relationships across base populations within breeds (e.g. unknown parent groups), and base populations across breeds (crosses) together with a sensible compatibility with genomic relationships. Considering metafounders might be advantageous in pedigree best linear unbiased prediction (BLUP) or single-step genomic BLUP. Existing methods to estimate relationships across metafounders $${\\varvec{\\Gamma}}$$ are not well adapted to highly unbalanced data, genotyped individuals far from base populations, or many unknown parent groups (within breed per year of birth). We derive likelihood methods to estimate $${\\varvec{\\Gamma}}$$ . For a single metafounder, summary statistics of pedigree and genomic relationships allow deriving a cubic equation with the real root being the maximum likelihood (ML) estimate of $${\\varvec{\\Gamma}}$$ . This equation is tested with Lacaune sheep data. For several metafounders, we split the first derivative of the complete likelihood in a term related to $${\\varvec{\\Gamma}}$$ , and a second term related to Mendelian sampling variances. Approximating the first derivative by its first term results in a pseudo-EM algorithm that iteratively updates the estimate of $${\\varvec{\\Gamma}}$$ by the corresponding block of the H-matrix. The method extends to complex situations with groups defined by year of birth, modelling the increase of $${\\varvec{\\Gamma}}$$ using estimates of the rate of increase of inbreeding ( $$\\Delta F$$ ), resulting in an expanded $${\\varvec{\\Gamma}}$$ and in a pseudo-EM+ $$\\Delta F$$ algorithm. We compare these methods with the generalized least squares (GLS) method using simulated data: complex crosses of two breeds in equal or unsymmetrical proportions; and in two breeds, with 10 groups per year of birth within breed. We simulate genotyping in all generations or in the last ones. For a single metafounder, the ML estimates of the Lacaune data corresponded to the maximum. For simulated data, when genotypes were spread across all generations, both GLS and pseudo-EM(+ $$\\Delta F$$ ) methods were accurate. With genotypes only available in the most recent generations, the GLS method was biased, whereas the pseudo-EM(+ $$\\Delta F$$ ) approach yielded more accurate and unbiased estimates. We derived ML, pseudo-EM and pseudo-EM+ $$\\Delta F$$ methods to estimate $${\\varvec{\\Gamma}}$$ in many realistic settings. Estimates are accurate in real and simulated data and have a low computational cost.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":null,"pages":null},"PeriodicalIF":3.6000,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics Selection Evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12711-024-00892-9","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The theory of “metafounders” proposes a unified framework for relationships across base populations within breeds (e.g. unknown parent groups), and base populations across breeds (crosses) together with a sensible compatibility with genomic relationships. Considering metafounders might be advantageous in pedigree best linear unbiased prediction (BLUP) or single-step genomic BLUP. Existing methods to estimate relationships across metafounders $${\varvec{\Gamma}}$$ are not well adapted to highly unbalanced data, genotyped individuals far from base populations, or many unknown parent groups (within breed per year of birth). We derive likelihood methods to estimate $${\varvec{\Gamma}}$$ . For a single metafounder, summary statistics of pedigree and genomic relationships allow deriving a cubic equation with the real root being the maximum likelihood (ML) estimate of $${\varvec{\Gamma}}$$ . This equation is tested with Lacaune sheep data. For several metafounders, we split the first derivative of the complete likelihood in a term related to $${\varvec{\Gamma}}$$ , and a second term related to Mendelian sampling variances. Approximating the first derivative by its first term results in a pseudo-EM algorithm that iteratively updates the estimate of $${\varvec{\Gamma}}$$ by the corresponding block of the H-matrix. The method extends to complex situations with groups defined by year of birth, modelling the increase of $${\varvec{\Gamma}}$$ using estimates of the rate of increase of inbreeding ( $$\Delta F$$ ), resulting in an expanded $${\varvec{\Gamma}}$$ and in a pseudo-EM+ $$\Delta F$$ algorithm. We compare these methods with the generalized least squares (GLS) method using simulated data: complex crosses of two breeds in equal or unsymmetrical proportions; and in two breeds, with 10 groups per year of birth within breed. We simulate genotyping in all generations or in the last ones. For a single metafounder, the ML estimates of the Lacaune data corresponded to the maximum. For simulated data, when genotypes were spread across all generations, both GLS and pseudo-EM(+ $$\Delta F$$ ) methods were accurate. With genotypes only available in the most recent generations, the GLS method was biased, whereas the pseudo-EM(+ $$\Delta F$$ ) approach yielded more accurate and unbiased estimates. We derived ML, pseudo-EM and pseudo-EM+ $$\Delta F$$ methods to estimate $${\varvec{\Gamma}}$$ in many realistic settings. Estimates are accurate in real and simulated data and have a low computational cost.
期刊介绍:
Genetics Selection Evolution invites basic, applied and methodological content that will aid the current understanding and the utilization of genetic variability in domestic animal species. Although the focus is on domestic animal species, research on other species is invited if it contributes to the understanding of the use of genetic variability in domestic animals. Genetics Selection Evolution publishes results from all levels of study, from the gene to the quantitative trait, from the individual to the population, the breed or the species. Contributions concerning both the biological approach, from molecular genetics to quantitative genetics, as well as the mathematical approach, from population genetics to statistics, are welcome. Specific areas of interest include but are not limited to: gene and QTL identification, mapping and characterization, analysis of new phenotypes, high-throughput SNP data analysis, functional genomics, cytogenetics, genetic diversity of populations and breeds, genetic evaluation, applied and experimental selection, genomic selection, selection efficiency, and statistical methodology for the genetic analysis of phenotypes with quantitative and mixed inheritance.