A hidden Markov model to estimate homozygous-by-descent probabilities associated with nested layers of ancestors

IF 1.2 4区生物学 Q4 ECOLOGY

Theoretical Population Biology Pub Date : 2022-06-01 DOI:10.1016/j.tpb.2022.03.001

Tom Druet , Mathieu Gautier

{"title":"A hidden Markov model to estimate homozygous-by-descent probabilities associated with nested layers of ancestors","authors":"Tom Druet , Mathieu Gautier","doi":"10.1016/j.tpb.2022.03.001","DOIUrl":null,"url":null,"abstract":"<div><p>Inbreeding results from the mating of related individuals and has negative consequences because it brings together deleterious variants in one individual. Genomic estimates of the inbreeding coefficients are preferred to pedigree-based estimators as they measure the realized inbreeding levels and they are more robust to pedigree errors. Several methods identifying homozygous-by-descent (HBD) segments with hidden Markov models (HMM) have been recently developed and are particularly valuable when the information is degraded or heterogeneous (e.g., low-fold sequencing, low marker density, heterogeneous genotype quality or variable marker spacing). We previously developed a multiple HBD class HMM where HBD segments are classified in different groups based on their length (e.g., recent versus old HBD segments) but we recently observed that for high inbreeding levels with many HBD segments, the estimated contributions might be biased towards more recent classes (i.e., associated with large HBD segments) although the overall estimated level of inbreeding remained unbiased. We herein propose a new model in which the HBD classification is modelled in successive nested levels with decreasing expected HBD segment lengths, the underlying exponential rates being directly related to the number of generations to the common ancestor. The non-HBD classes are now modelled as a mixture of HBD segments from later generations and shorter non-HBD segments (i.e., both with higher rates). The new model has improved statistical properties and performs better on simulated data compared to our previous version. We also show that the parameters of the model are easier to interpret and that the model is more robust to the choice of the number of classes. Overall, the new model results in an improved partitioning of inbreeding in different HBD classes and should be preferred.</p></div>","PeriodicalId":49437,"journal":{"name":"Theoretical Population Biology","volume":null,"pages":null},"PeriodicalIF":1.2000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Theoretical Population Biology","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0040580922000168","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ECOLOGY","Score":null,"Total":0}

引用次数: 3

Abstract

Inbreeding results from the mating of related individuals and has negative consequences because it brings together deleterious variants in one individual. Genomic estimates of the inbreeding coefficients are preferred to pedigree-based estimators as they measure the realized inbreeding levels and they are more robust to pedigree errors. Several methods identifying homozygous-by-descent (HBD) segments with hidden Markov models (HMM) have been recently developed and are particularly valuable when the information is degraded or heterogeneous (e.g., low-fold sequencing, low marker density, heterogeneous genotype quality or variable marker spacing). We previously developed a multiple HBD class HMM where HBD segments are classified in different groups based on their length (e.g., recent versus old HBD segments) but we recently observed that for high inbreeding levels with many HBD segments, the estimated contributions might be biased towards more recent classes (i.e., associated with large HBD segments) although the overall estimated level of inbreeding remained unbiased. We herein propose a new model in which the HBD classification is modelled in successive nested levels with decreasing expected HBD segment lengths, the underlying exponential rates being directly related to the number of generations to the common ancestor. The non-HBD classes are now modelled as a mixture of HBD segments from later generations and shorter non-HBD segments (i.e., both with higher rates). The new model has improved statistical properties and performs better on simulated data compared to our previous version. We also show that the parameters of the model are easier to interpret and that the model is more robust to the choice of the number of classes. Overall, the new model results in an improved partitioning of inbreeding in different HBD classes and should be preferred.

查看原文本刊更多论文

一种估计与嵌套祖先层相关的纯合概率的隐马尔可夫模型

近亲繁殖是由相关个体的交配产生的，它会产生负面的后果，因为它会在一个个体中聚集有害的变异。近交系数的基因组估计值比基于家系的估计值更受欢迎，因为它们测量了已实现的近交水平，并且对家系误差更稳健。最近开发了几种使用隐马尔可夫模型(HMM)识别纯合遗传(HBD)片段的方法，这些方法在信息退化或异质(例如，低倍测序，低标记密度，异质基因型质量或可变标记间距)的情况下特别有价值。我们之前开发了一个多HBD类HMM，其中HBD片段根据其长度被分类为不同的组(例如，最近的HBD片段与旧的HBD片段)，但我们最近观察到，对于许多HBD片段的高近交水平，估计的贡献可能偏向于更近的类别(即与大HBD片段相关)，尽管近交的总体估计水平保持无偏倚。在此，我们提出了一个新的模型，其中HBD分类在连续嵌套的水平上建模，期望HBD片段长度减少，潜在的指数率与共同祖先的代数直接相关。非HBD类现在被建模为来自后代的HBD片段和较短的非HBD片段的混合物(即，两者的发病率都较高)。与之前的版本相比，新模型改进了统计特性，在模拟数据上表现更好。我们还表明，模型的参数更容易解释，并且模型对类别数量的选择更具鲁棒性。总的来说，新模型改善了不同HBD类别近亲繁殖的分配，应该是首选的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Theoretical Population Biology 生物-进化生物学

CiteScore

2.50

自引率

14.30%

发文量

审稿时长

6-12 weeks

期刊介绍： An interdisciplinary journal, Theoretical Population Biology presents articles on theoretical aspects of the biology of populations, particularly in the areas of demography, ecology, epidemiology, evolution, and genetics. Emphasis is on the development of mathematical theory and models that enhance the understanding of biological phenomena. Articles highlight the motivation and significance of the work for advancing progress in biology, relying on a substantial mathematical effort to obtain biological insight. The journal also presents empirical results and computational and statistical methods directly impinging on theoretical problems in population biology.