Definition of metafounders based on population structure analysis

IF 3.1 1区农林科学 Q1 AGRICULTURE, DAIRY & ANIMAL SCIENCE

Genetics Selection Evolution Pub Date : 2024-06-06 DOI:10.1186/s12711-024-00913-7

Christine Anglhuber, Christian Edel, Eduardo C. G. Pimentel, Reiner Emmerling, Kay-Uwe Götz, Georg Thaller

{"title":"Definition of metafounders based on population structure analysis","authors":"Christine Anglhuber, Christian Edel, Eduardo C. G. Pimentel, Reiner Emmerling, Kay-Uwe Götz, Georg Thaller","doi":"10.1186/s12711-024-00913-7","DOIUrl":null,"url":null,"abstract":"Limitations of the concept of identity by descent in the presence of stratification within a breeding population may lead to an incomplete formulation of the conventional numerator relationship matrix ( $$\\mathbf{A}$$ ). Combining $$\\mathbf{A}$$ with the genomic relationship matrix ( $$\\mathbf{G}$$ ) in a single-step approach for genetic evaluation may cause inconsistencies that can be a source of bias in the resulting predictions. The objective of this study was to identify stratification using genomic data and to transfer this information to matrix $$\\mathbf{A}$$ , to improve the compatibility of $$\\mathbf{A}$$ and $$\\mathbf{G}$$ . Using software to detect population stratification (ADMIXTURE), we developed an iterative approach. First, we identified 2 to 40 strata ( $$k$$ ) with ADMIXTURE, which we then introduced in a stepwise manner into matrix $$\\mathbf{A}$$ , to generate matrix $${\\mathbf{A}}^{{\\varvec{\\Gamma}}}$$ using the metafounder methodology. Improvements in consistency between matrix $$\\mathbf{G}$$ and $${\\mathbf{A}}^{{\\varvec{\\Gamma}}}$$ were evaluated by regression analysis and through the comparison of the overall mean and mean diagonal values of both matrices. The approach was tested on genotype and pedigree information of European and North American Brown Swiss animals (85,249). Analyses with ADMIXTURE were initially performed on the full set of genotypes (S1). In addition, we used an alternative dataset where we avoided sampling of closely related animals (S2). Results of the regression analyses of standard $$\\mathbf{A}$$ on $$\\mathbf{G}$$ were – 0.489, 0.780 and 0.647 for intercept, slope and fit of the regression. When analysing S1 data results of the regression for $${\\mathbf{A}}^{{\\varvec{\\Gamma}}}$$ on $$\\mathbf{G}$$ corresponding values were – 0.028, 1.087 and 0.807 for $$k$$ =7, while there was no clear optimum $$k$$ . Analyses of S2 gave a clear optimal $$k$$ =24, with − 0.020, 0.998 and 0.817 as results of the regression. For this $$k$$ differences in mean and mean diagonal values between both matrices were negligible. The derivation of hidden stratification information based on genotyped animals and its integration into $$\\mathbf{A}$$ improved compatibility of the resulting $${\\mathbf{A}}^{{\\varvec{\\Gamma}}}$$ and $$\\mathbf{G}$$ considerably compared to the initial situation. In dairy breeding populations with large half-sib families as sub-structures it is necessary to balance the data when applying population structure analysis to obtain meaningful results.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"4 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics Selection Evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12711-024-00913-7","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Limitations of the concept of identity by descent in the presence of stratification within a breeding population may lead to an incomplete formulation of the conventional numerator relationship matrix ( $$\mathbf{A}$$ ). Combining $$\mathbf{A}$$ with the genomic relationship matrix ( $$\mathbf{G}$$ ) in a single-step approach for genetic evaluation may cause inconsistencies that can be a source of bias in the resulting predictions. The objective of this study was to identify stratification using genomic data and to transfer this information to matrix $$\mathbf{A}$$ , to improve the compatibility of $$\mathbf{A}$$ and $$\mathbf{G}$$ . Using software to detect population stratification (ADMIXTURE), we developed an iterative approach. First, we identified 2 to 40 strata ( $$k$$ ) with ADMIXTURE, which we then introduced in a stepwise manner into matrix $$\mathbf{A}$$ , to generate matrix $${\mathbf{A}}^{{\varvec{\Gamma}}}$$ using the metafounder methodology. Improvements in consistency between matrix $$\mathbf{G}$$ and $${\mathbf{A}}^{{\varvec{\Gamma}}}$$ were evaluated by regression analysis and through the comparison of the overall mean and mean diagonal values of both matrices. The approach was tested on genotype and pedigree information of European and North American Brown Swiss animals (85,249). Analyses with ADMIXTURE were initially performed on the full set of genotypes (S1). In addition, we used an alternative dataset where we avoided sampling of closely related animals (S2). Results of the regression analyses of standard $$\mathbf{A}$$ on $$\mathbf{G}$$ were – 0.489, 0.780 and 0.647 for intercept, slope and fit of the regression. When analysing S1 data results of the regression for $${\mathbf{A}}^{{\varvec{\Gamma}}}$$ on $$\mathbf{G}$$ corresponding values were – 0.028, 1.087 and 0.807 for $$k$$ =7, while there was no clear optimum $$k$$ . Analyses of S2 gave a clear optimal $$k$$ =24, with − 0.020, 0.998 and 0.817 as results of the regression. For this $$k$$ differences in mean and mean diagonal values between both matrices were negligible. The derivation of hidden stratification information based on genotyped animals and its integration into $$\mathbf{A}$$ improved compatibility of the resulting $${\mathbf{A}}^{{\varvec{\Gamma}}}$$ and $$\mathbf{G}$$ considerably compared to the initial situation. In dairy breeding populations with large half-sib families as sub-structures it is necessary to balance the data when applying population structure analysis to obtain meaningful results.

查看原文本刊更多论文

基于种群结构分析的元创始者定义

在育种群体中存在分层的情况下，后裔同一性概念的局限性可能会导致传统的分子关系矩阵（$$\mathbf{A}$$）表述不完整。将 $$\mathbf{A}$ 与基因组关系矩阵 ( $$\mathbf{G}$)结合起来进行单步遗传评估可能会导致不一致，从而使预测结果产生偏差。本研究的目的是利用基因组数据识别分层，并将这一信息转移到矩阵 $$\mathbf{A}$ 中，以提高 $$\mathbf{A}$ 和 $$\mathbf{G}$ 的兼容性。我们使用检测人群分层的软件（ADMIXTURE）开发了一种迭代方法。首先，我们利用 ADMIXTURE 确定了 2 到 40 个分层（$$k$$），然后将这些分层逐步引入矩阵 $$\mathbf{A}$$，利用元创始方法生成矩阵 ${\mathbf{A}}^{\varvec{Gamma}}$。通过回归分析和比较两个矩阵的总平均值和对角线平均值，评估了矩阵 $$\mathbf{G}$ 和 $${\mathbf{A}}^{{\varvec{Gamma}}$ 之间一致性的改进。该方法在欧洲和北美棕瑞动物的基因型和血统信息中进行了测试 (85,249)。最初使用 ADMIXTURE 对全套基因型进行了分析（S1）。此外，我们还使用了另一个数据集，其中我们避免了对近亲动物的取样（S2）。标准 $$mathbf{A}$ 对 $$mathbf{G}$ 的回归分析结果为：截距、斜率和回归拟合度分别为 -0.489、0.780 和 0.647。在分析 S1 数据时，$${mathbf{A}}^{{\varvec{\Gamma}}$$ 对 $$mathbf{G}$ 的回归结果在 $$k$$ =7 时的相应值分别为 -0.028、1.087 和 0.807，而没有明显的最佳 $$k$$。对 S2 的分析表明，最佳 k$$ =24，回归结果为 -0.020、0.998 和 0.817。对于这个 k$$$，两个矩阵的平均值和对角线平均值的差异可以忽略不计。与最初的情况相比，基于基因分型动物的隐藏分层信息的推导及其与 $$\mathbf{A}$ 的整合大大提高了所得 $${mathbf{A}}^{\{varvec{Gamma}}$ 和 $$\mathbf{G}$ 的兼容性。在以大型半同胞家系为子结构的奶牛育种群体中，应用群体结构分析时有必要平衡数据，以获得有意义的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Genetics Selection Evolution 生物-奶制品与动物科学

CiteScore

6.50

自引率

9.80%

发文量

审稿时长

1 months

期刊介绍： Genetics Selection Evolution invites basic, applied and methodological content that will aid the current understanding and the utilization of genetic variability in domestic animal species. Although the focus is on domestic animal species, research on other species is invited if it contributes to the understanding of the use of genetic variability in domestic animals. Genetics Selection Evolution publishes results from all levels of study, from the gene to the quantitative trait, from the individual to the population, the breed or the species. Contributions concerning both the biological approach, from molecular genetics to quantitative genetics, as well as the mathematical approach, from population genetics to statistics, are welcome. Specific areas of interest include but are not limited to: gene and QTL identification, mapping and characterization, analysis of new phenotypes, high-throughput SNP data analysis, functional genomics, cytogenetics, genetic diversity of populations and breeds, genetic evaluation, applied and experimental selection, genomic selection, selection efficiency, and statistical methodology for the genetic analysis of phenotypes with quantitative and mixed inheritance.