Genomic estimates of Identity-By-Descent relationships in large scale data sets.

IF 3.1 1区农林科学 Q1 AGRICULTURE, DAIRY & ANIMAL SCIENCE

Genetics Selection Evolution Pub Date : 2026-03-13 DOI:10.1186/s12711-026-01038-9

Theo Meuwisen,Xijiang Yu,Peer Berg

{"title":"Genomic estimates of Identity-By-Descent relationships in large scale data sets.","authors":"Theo Meuwisen,Xijiang Yu,Peer Berg","doi":"10.1186/s12711-026-01038-9","DOIUrl":null,"url":null,"abstract":"BACKGROUND\r\nGenomic relationship and inbreeding estimates are either based on genetic drift (e.g. the Genomic Relationship Matrix; GRM), homozygosity (e.g. Runs of Homozygosity; ROH), or Identity-By-Descent (IBD). A genomic IBD-based relationship matrix, Gla, is obtained by linkage analysis which uses genomic data to distinguish paternal versus maternal inheritances of chromosomal segments to replace the 50/50 probabilities used to calculate pedigree-based relationships (A matrix). Our aim was to develop a fast approximate algorithm, FGla, to estimate the Gla matrix in large complex pedigrees making use of dense marker genotypes, and to compare Gla to A, GRM and ROH based inbreeding (FROH) in simulated and a large scale Norwegian Red Cattle (NRF) data set.\r\n\r\nRESULTS\r\nGiven pedigree data and ≥ 3 generations of 45 k marker genotypes, marker positions were detected that unambiguously identified maternal/paternal inheritance, and inheritances at intermediate positions were imputed by the Viterbi algorithm from the positions with known inheritance. Any remaining unknown inheritances were randomly sampled (paternal or maternal), and the sampling errors that this introduced were averaged out by the large number of marker loci used (correlation between replicated estimates: 0.9998). Also, calculations were limited to the relationship coefficients that were actually needed, assuming that relationships for a limited set of candidates were needed. The accuracy of estimated Gla coefficients increased from 0.971 to 0.998, when genotyping increased from the actually genotyped NRF cattle towards all pedigreed animals. The accuracy of the GRM was 0.936, but required only genotyping of the animals whose relationships were needed. Gla relationships were approximately unbiased in the Best Linear Unbiased Prediction (BLUP) sense. Hence, if Gla based inbreeding management predicts an increase in relationships then an identical increase in true IBD relationships is expected. Gla uses the same base population as A, namely that of the pedigree.\r\n\r\nCONCLUSIONS\r\nAn approximate computationally efficient multipoint linkage analysis algorithm was developed to estimate unbiased IBD-based relationship and inbreeding coefficients. Its unbiasedness and precise definition of the base population makes it well suited for the genomic management of inbreeding and genomic optimal contribution selection. In addition, Gla based optimal contribution selection is neutral with respect to allele frequency changes.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"9 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2026-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics Selection Evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12711-026-01038-9","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

BACKGROUND Genomic relationship and inbreeding estimates are either based on genetic drift (e.g. the Genomic Relationship Matrix; GRM), homozygosity (e.g. Runs of Homozygosity; ROH), or Identity-By-Descent (IBD). A genomic IBD-based relationship matrix, Gla, is obtained by linkage analysis which uses genomic data to distinguish paternal versus maternal inheritances of chromosomal segments to replace the 50/50 probabilities used to calculate pedigree-based relationships (A matrix). Our aim was to develop a fast approximate algorithm, FGla, to estimate the Gla matrix in large complex pedigrees making use of dense marker genotypes, and to compare Gla to A, GRM and ROH based inbreeding (FROH) in simulated and a large scale Norwegian Red Cattle (NRF) data set. RESULTS Given pedigree data and ≥ 3 generations of 45 k marker genotypes, marker positions were detected that unambiguously identified maternal/paternal inheritance, and inheritances at intermediate positions were imputed by the Viterbi algorithm from the positions with known inheritance. Any remaining unknown inheritances were randomly sampled (paternal or maternal), and the sampling errors that this introduced were averaged out by the large number of marker loci used (correlation between replicated estimates: 0.9998). Also, calculations were limited to the relationship coefficients that were actually needed, assuming that relationships for a limited set of candidates were needed. The accuracy of estimated Gla coefficients increased from 0.971 to 0.998, when genotyping increased from the actually genotyped NRF cattle towards all pedigreed animals. The accuracy of the GRM was 0.936, but required only genotyping of the animals whose relationships were needed. Gla relationships were approximately unbiased in the Best Linear Unbiased Prediction (BLUP) sense. Hence, if Gla based inbreeding management predicts an increase in relationships then an identical increase in true IBD relationships is expected. Gla uses the same base population as A, namely that of the pedigree. CONCLUSIONS An approximate computationally efficient multipoint linkage analysis algorithm was developed to estimate unbiased IBD-based relationship and inbreeding coefficients. Its unbiasedness and precise definition of the base population makes it well suited for the genomic management of inbreeding and genomic optimal contribution selection. In addition, Gla based optimal contribution selection is neutral with respect to allele frequency changes.

查看原文本刊更多论文

大规模数据集中血统身份关系的基因组估计。

基因组关系和近交估计要么基于遗传漂变（例如基因组关系矩阵；GRM），纯合子（例如纯合子运行；ROH），要么基于血统身份（IBD）。基于ibd的基因组关系矩阵Gla是通过连锁分析获得的，该分析使用基因组数据来区分染色体片段的父系与母系遗传，以取代用于计算基于家系关系的50/50概率（A矩阵）。我们的目标是开发一种快速近似算法FGla，利用密集标记基因型估计大型复杂谱系中的Gla矩阵，并将Gla与模拟和大规模挪威红牛（NRF）数据集中基于a、GRM和ROH的近交（FROH）进行比较。结果根据家谱数据和≥3代的45 k标记基因型，检测到能够明确识别母系/父系遗传的标记位置，并利用已知遗传位置的Viterbi算法推算出中间位置的遗传。任何剩余的未知遗传被随机抽样（父系或母系），由此引入的抽样误差被使用的大量标记位点平均（重复估计之间的相关性：0.9998）。此外，计算仅限于实际需要的关系系数，假设需要一组有限的候选者的关系。当基因分型从实际基因分型的NRF牛向所有纯种动物增加时，估计Gla系数的准确性从0.971增加到0.998。GRM的准确性为0.936，但只需要对需要关系的动物进行基因分型。Gla关系在最佳线性无偏预测（BLUP）意义上近似无偏。因此，如果基于Gla的近交管理预测了关系的增加，那么预计真正的IBD关系也会增加。Gla使用与A相同的基础种群，即谱系。结论提出了一种近似计算效率高的多点连锁分析算法，可以估计无偏ibd关系和近交系数。它的无偏性和对基础群体的精确定义使其非常适合近交的基因组管理和基因组最优贡献选择。此外，基于Gla的最优贡献选择对于等位基因频率的变化是中性的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Genetics Selection Evolution 生物-奶制品与动物科学

CiteScore

6.50

自引率

9.80%

发文量

审稿时长

1 months

期刊介绍： Genetics Selection Evolution invites basic, applied and methodological content that will aid the current understanding and the utilization of genetic variability in domestic animal species. Although the focus is on domestic animal species, research on other species is invited if it contributes to the understanding of the use of genetic variability in domestic animals. Genetics Selection Evolution publishes results from all levels of study, from the gene to the quantitative trait, from the individual to the population, the breed or the species. Contributions concerning both the biological approach, from molecular genetics to quantitative genetics, as well as the mathematical approach, from population genetics to statistics, are welcome. Specific areas of interest include but are not limited to: gene and QTL identification, mapping and characterization, analysis of new phenotypes, high-throughput SNP data analysis, functional genomics, cytogenetics, genetic diversity of populations and breeds, genetic evaluation, applied and experimental selection, genomic selection, selection efficiency, and statistical methodology for the genetic analysis of phenotypes with quantitative and mixed inheritance.