Estimating genomic relationships of metafounders across and within breeds using maximum likelihood, pseudo-expectation–maximization maximum likelihood and increase of relationships

IF 3.1 1区农林科学 Q1 AGRICULTURE, DAIRY & ANIMAL SCIENCE

Genetics Selection Evolution Pub Date : 2024-05-02 DOI:10.1186/s12711-024-00892-9

Andres Legarra, Matias Bermann, Quanshun Mei, Ole F. Christensen

{"title":"Estimating genomic relationships of metafounders across and within breeds using maximum likelihood, pseudo-expectation–maximization maximum likelihood and increase of relationships","authors":"Andres Legarra, Matias Bermann, Quanshun Mei, Ole F. Christensen","doi":"10.1186/s12711-024-00892-9","DOIUrl":null,"url":null,"abstract":"The theory of “metafounders” proposes a unified framework for relationships across base populations within breeds (e.g. unknown parent groups), and base populations across breeds (crosses) together with a sensible compatibility with genomic relationships. Considering metafounders might be advantageous in pedigree best linear unbiased prediction (BLUP) or single-step genomic BLUP. Existing methods to estimate relationships across metafounders $${\\varvec{\\Gamma}}$$ are not well adapted to highly unbalanced data, genotyped individuals far from base populations, or many unknown parent groups (within breed per year of birth). We derive likelihood methods to estimate $${\\varvec{\\Gamma}}$$ . For a single metafounder, summary statistics of pedigree and genomic relationships allow deriving a cubic equation with the real root being the maximum likelihood (ML) estimate of $${\\varvec{\\Gamma}}$$ . This equation is tested with Lacaune sheep data. For several metafounders, we split the first derivative of the complete likelihood in a term related to $${\\varvec{\\Gamma}}$$ , and a second term related to Mendelian sampling variances. Approximating the first derivative by its first term results in a pseudo-EM algorithm that iteratively updates the estimate of $${\\varvec{\\Gamma}}$$ by the corresponding block of the H-matrix. The method extends to complex situations with groups defined by year of birth, modelling the increase of $${\\varvec{\\Gamma}}$$ using estimates of the rate of increase of inbreeding ( $$\\Delta F$$ ), resulting in an expanded $${\\varvec{\\Gamma}}$$ and in a pseudo-EM+ $$\\Delta F$$ algorithm. We compare these methods with the generalized least squares (GLS) method using simulated data: complex crosses of two breeds in equal or unsymmetrical proportions; and in two breeds, with 10 groups per year of birth within breed. We simulate genotyping in all generations or in the last ones. For a single metafounder, the ML estimates of the Lacaune data corresponded to the maximum. For simulated data, when genotypes were spread across all generations, both GLS and pseudo-EM(+ $$\\Delta F$$ ) methods were accurate. With genotypes only available in the most recent generations, the GLS method was biased, whereas the pseudo-EM(+ $$\\Delta F$$ ) approach yielded more accurate and unbiased estimates. We derived ML, pseudo-EM and pseudo-EM+ $$\\Delta F$$ methods to estimate $${\\varvec{\\Gamma}}$$ in many realistic settings. Estimates are accurate in real and simulated data and have a low computational cost.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"38 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics Selection Evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12711-024-00892-9","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The theory of “metafounders” proposes a unified framework for relationships across base populations within breeds (e.g. unknown parent groups), and base populations across breeds (crosses) together with a sensible compatibility with genomic relationships. Considering metafounders might be advantageous in pedigree best linear unbiased prediction (BLUP) or single-step genomic BLUP. Existing methods to estimate relationships across metafounders $${\varvec{\Gamma}}$$ are not well adapted to highly unbalanced data, genotyped individuals far from base populations, or many unknown parent groups (within breed per year of birth). We derive likelihood methods to estimate $${\varvec{\Gamma}}$$ . For a single metafounder, summary statistics of pedigree and genomic relationships allow deriving a cubic equation with the real root being the maximum likelihood (ML) estimate of $${\varvec{\Gamma}}$$ . This equation is tested with Lacaune sheep data. For several metafounders, we split the first derivative of the complete likelihood in a term related to $${\varvec{\Gamma}}$$ , and a second term related to Mendelian sampling variances. Approximating the first derivative by its first term results in a pseudo-EM algorithm that iteratively updates the estimate of $${\varvec{\Gamma}}$$ by the corresponding block of the H-matrix. The method extends to complex situations with groups defined by year of birth, modelling the increase of $${\varvec{\Gamma}}$$ using estimates of the rate of increase of inbreeding ( $$\Delta F$$ ), resulting in an expanded $${\varvec{\Gamma}}$$ and in a pseudo-EM+ $$\Delta F$$ algorithm. We compare these methods with the generalized least squares (GLS) method using simulated data: complex crosses of two breeds in equal or unsymmetrical proportions; and in two breeds, with 10 groups per year of birth within breed. We simulate genotyping in all generations or in the last ones. For a single metafounder, the ML estimates of the Lacaune data corresponded to the maximum. For simulated data, when genotypes were spread across all generations, both GLS and pseudo-EM(+ $$\Delta F$$ ) methods were accurate. With genotypes only available in the most recent generations, the GLS method was biased, whereas the pseudo-EM(+ $$\Delta F$$ ) approach yielded more accurate and unbiased estimates. We derived ML, pseudo-EM and pseudo-EM+ $$\Delta F$$ methods to estimate $${\varvec{\Gamma}}$$ in many realistic settings. Estimates are accurate in real and simulated data and have a low computational cost.

查看原文本刊更多论文

利用最大似然法、伪期望最大似然法和关系增加法估算跨品种和品种内元始祖的基因组关系

元始祖鸟 "理论提出了一个统一的框架，用于处理品种内基础种群之间的关系（如未知亲本群）和品种间基础种群之间的关系（杂交），并与基因组关系保持合理的一致性。在进行血统最佳线性无偏预测（BLUP）或单步基因组 BLUP 时，考虑元创始人可能会更有优势。现有的估计元始祖关系的方法$${varvec{\Gamma}}$$不能很好地适应高度不平衡的数据、远离基础种群的基因分型个体或许多未知的亲本群体（出生年份内的品种）。我们推导了估计 $${varvec{/Gamma}}$ 的似然法。对于单个元始祖而言，通过对血统和基因组关系的汇总统计，可以推导出一个三次方程，其实数根就是 $${varvec\{Gamma}}$ 的最大似然 (ML) 估计值。我们使用 Lacaune 绵羊数据对该方程进行了测试。对于几个元创始人，我们将完全似然的一阶导数分为与 $${varvec{Gamma}}$ 有关的一项和与孟德尔抽样方差有关的第二项。用第一项来近似第一导数，就会产生一种伪 EM 算法，通过 H 矩阵的相应块来迭代更新 $${varvec\{Gamma}}$ 的估计值。该方法可以扩展到复杂的情况，即根据出生年份定义群体，使用近亲繁殖增长率（$$\Delta F$$）的估计值来模拟 $${varvec\{Gamma}}$ 的增加，从而产生一个扩展的 $${varvec\{Gamma}}$ 算法和一个伪 EM+ $$\Delta F$ 算法。我们使用模拟数据将这些方法与广义最小二乘法（GLS）进行了比较：两个品种等比例或不对称比例的复杂杂交；两个品种中，品种内每出生年份有 10 个组。我们模拟所有世代或最后世代的基因分型。对于单个元创始人，Lacaune 数据的 ML 估计值与最大值一致。对于模拟数据，当基因型分布于所有世代时，GLS 和 pseudo-EM(+ $$\Delta F$$ ) 方法都是准确的。如果只有最近几代才有基因型，GLS 方法就会有偏差，而 pseudo-EM(+ $$\Delta F$$ ) 方法则能得到更准确和无偏的估计值。我们推导出了 ML、pseudo-EM 和 pseudo-EM+ $$\Delta F$ 方法，用于在许多现实环境中估计 $${\varvec{\Gamma}}$$。在真实数据和模拟数据中，估计值都很准确，而且计算成本很低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Genetics Selection Evolution 生物-奶制品与动物科学

CiteScore

6.50

自引率

9.80%

发文量

审稿时长

1 months

期刊介绍： Genetics Selection Evolution invites basic, applied and methodological content that will aid the current understanding and the utilization of genetic variability in domestic animal species. Although the focus is on domestic animal species, research on other species is invited if it contributes to the understanding of the use of genetic variability in domestic animals. Genetics Selection Evolution publishes results from all levels of study, from the gene to the quantitative trait, from the individual to the population, the breed or the species. Contributions concerning both the biological approach, from molecular genetics to quantitative genetics, as well as the mathematical approach, from population genetics to statistics, are welcome. Specific areas of interest include but are not limited to: gene and QTL identification, mapping and characterization, analysis of new phenotypes, high-throughput SNP data analysis, functional genomics, cytogenetics, genetic diversity of populations and breeds, genetic evaluation, applied and experimental selection, genomic selection, selection efficiency, and statistical methodology for the genetic analysis of phenotypes with quantitative and mixed inheritance.