Christine Anglhuber, Christian Edel, Eduardo C. G. Pimentel, Reiner Emmerling, Kay-Uwe Götz, Georg Thaller
{"title":"Definition of metafounders based on population structure analysis","authors":"Christine Anglhuber, Christian Edel, Eduardo C. G. Pimentel, Reiner Emmerling, Kay-Uwe Götz, Georg Thaller","doi":"10.1186/s12711-024-00913-7","DOIUrl":null,"url":null,"abstract":"Limitations of the concept of identity by descent in the presence of stratification within a breeding population may lead to an incomplete formulation of the conventional numerator relationship matrix ( $$\\mathbf{A}$$ ). Combining $$\\mathbf{A}$$ with the genomic relationship matrix ( $$\\mathbf{G}$$ ) in a single-step approach for genetic evaluation may cause inconsistencies that can be a source of bias in the resulting predictions. The objective of this study was to identify stratification using genomic data and to transfer this information to matrix $$\\mathbf{A}$$ , to improve the compatibility of $$\\mathbf{A}$$ and $$\\mathbf{G}$$ . Using software to detect population stratification (ADMIXTURE), we developed an iterative approach. First, we identified 2 to 40 strata ( $$k$$ ) with ADMIXTURE, which we then introduced in a stepwise manner into matrix $$\\mathbf{A}$$ , to generate matrix $${\\mathbf{A}}^{{\\varvec{\\Gamma}}}$$ using the metafounder methodology. Improvements in consistency between matrix $$\\mathbf{G}$$ and $${\\mathbf{A}}^{{\\varvec{\\Gamma}}}$$ were evaluated by regression analysis and through the comparison of the overall mean and mean diagonal values of both matrices. The approach was tested on genotype and pedigree information of European and North American Brown Swiss animals (85,249). Analyses with ADMIXTURE were initially performed on the full set of genotypes (S1). In addition, we used an alternative dataset where we avoided sampling of closely related animals (S2). Results of the regression analyses of standard $$\\mathbf{A}$$ on $$\\mathbf{G}$$ were – 0.489, 0.780 and 0.647 for intercept, slope and fit of the regression. When analysing S1 data results of the regression for $${\\mathbf{A}}^{{\\varvec{\\Gamma}}}$$ on $$\\mathbf{G}$$ corresponding values were – 0.028, 1.087 and 0.807 for $$k$$ =7, while there was no clear optimum $$k$$ . Analyses of S2 gave a clear optimal $$k$$ =24, with − 0.020, 0.998 and 0.817 as results of the regression. For this $$k$$ differences in mean and mean diagonal values between both matrices were negligible. The derivation of hidden stratification information based on genotyped animals and its integration into $$\\mathbf{A}$$ improved compatibility of the resulting $${\\mathbf{A}}^{{\\varvec{\\Gamma}}}$$ and $$\\mathbf{G}$$ considerably compared to the initial situation. In dairy breeding populations with large half-sib families as sub-structures it is necessary to balance the data when applying population structure analysis to obtain meaningful results.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"4 1","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics Selection Evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12711-024-00913-7","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Limitations of the concept of identity by descent in the presence of stratification within a breeding population may lead to an incomplete formulation of the conventional numerator relationship matrix ( $$\mathbf{A}$$ ). Combining $$\mathbf{A}$$ with the genomic relationship matrix ( $$\mathbf{G}$$ ) in a single-step approach for genetic evaluation may cause inconsistencies that can be a source of bias in the resulting predictions. The objective of this study was to identify stratification using genomic data and to transfer this information to matrix $$\mathbf{A}$$ , to improve the compatibility of $$\mathbf{A}$$ and $$\mathbf{G}$$ . Using software to detect population stratification (ADMIXTURE), we developed an iterative approach. First, we identified 2 to 40 strata ( $$k$$ ) with ADMIXTURE, which we then introduced in a stepwise manner into matrix $$\mathbf{A}$$ , to generate matrix $${\mathbf{A}}^{{\varvec{\Gamma}}}$$ using the metafounder methodology. Improvements in consistency between matrix $$\mathbf{G}$$ and $${\mathbf{A}}^{{\varvec{\Gamma}}}$$ were evaluated by regression analysis and through the comparison of the overall mean and mean diagonal values of both matrices. The approach was tested on genotype and pedigree information of European and North American Brown Swiss animals (85,249). Analyses with ADMIXTURE were initially performed on the full set of genotypes (S1). In addition, we used an alternative dataset where we avoided sampling of closely related animals (S2). Results of the regression analyses of standard $$\mathbf{A}$$ on $$\mathbf{G}$$ were – 0.489, 0.780 and 0.647 for intercept, slope and fit of the regression. When analysing S1 data results of the regression for $${\mathbf{A}}^{{\varvec{\Gamma}}}$$ on $$\mathbf{G}$$ corresponding values were – 0.028, 1.087 and 0.807 for $$k$$ =7, while there was no clear optimum $$k$$ . Analyses of S2 gave a clear optimal $$k$$ =24, with − 0.020, 0.998 and 0.817 as results of the regression. For this $$k$$ differences in mean and mean diagonal values between both matrices were negligible. The derivation of hidden stratification information based on genotyped animals and its integration into $$\mathbf{A}$$ improved compatibility of the resulting $${\mathbf{A}}^{{\varvec{\Gamma}}}$$ and $$\mathbf{G}$$ considerably compared to the initial situation. In dairy breeding populations with large half-sib families as sub-structures it is necessary to balance the data when applying population structure analysis to obtain meaningful results.
期刊介绍:
Genetics Selection Evolution invites basic, applied and methodological content that will aid the current understanding and the utilization of genetic variability in domestic animal species. Although the focus is on domestic animal species, research on other species is invited if it contributes to the understanding of the use of genetic variability in domestic animals. Genetics Selection Evolution publishes results from all levels of study, from the gene to the quantitative trait, from the individual to the population, the breed or the species. Contributions concerning both the biological approach, from molecular genetics to quantitative genetics, as well as the mathematical approach, from population genetics to statistics, are welcome. Specific areas of interest include but are not limited to: gene and QTL identification, mapping and characterization, analysis of new phenotypes, high-throughput SNP data analysis, functional genomics, cytogenetics, genetic diversity of populations and breeds, genetic evaluation, applied and experimental selection, genomic selection, selection efficiency, and statistical methodology for the genetic analysis of phenotypes with quantitative and mixed inheritance.