Comparison of cohort-based identical-by-descent (IBD) segment finding methods for endogamous populations

Huyen T. Dang, Shi Jie Samuel Tan, Sara Mathieson
{"title":"Comparison of cohort-based identical-by-descent (IBD) segment finding methods for endogamous populations","authors":"Huyen T. Dang, Shi Jie Samuel Tan, Sara Mathieson","doi":"10.1145/3535508.3545104","DOIUrl":null,"url":null,"abstract":"Segments of DNA that are inherited from a common ancestor are referred to as identical-by-descent (IBD). Because these segments are inherited, they not only allow us to study population characteristics and the sharing of rare variants but also understand the hidden familial relationships within populations. Over the past two decades, various IBD finding algorithms have been developed using hidden Markov model (HMM), hashing and extension, and Burrows-Wheeler Transform (BWT) approaches. In this study, we investigate the utility of pedigree information in enhancing the efficacy of IBD finding methods for endogamous populations. With the increasing prevalence of computationally efficient sequencing technology and proper documentation of pedigree structures, we expect complete pedigree information to become readily available for more populations. While IBD segments have been used to reconstruct pedigrees [1], because we now have access to the pedigree, it is a natural question to ask if including pedigree information would substantially improve IBD segment finding for the purpose of studying inheritance. Our contributions center around the proposition of two types of IBD finding algorithms for reducing the number of false positives in the detected IBD segments. Both methods analyze the familial relationships between cohorts of individuals who are initially hypothesized to share IBD segments. Our first algorithm is inspired by a k-nearest neighbors (KNN) algorithm [2] where we perform outlier detection on the cohort of IBD-sharing individuals. The metric for proximity is determined by the kinship coefficient evaluated from the pairwise relationships between individuals from the cohort. Our second algorithm is inspired by the Bonsai algorithm [3] and uses multiple hypothesis tests to evaluate if an individual has much more IBD than is expected by chance. Bonsai IBD detection algorithm first divides the pedigree into multiple cohorts of family members with no shared individuals, proceeds to pick the two cohorts with the most shared IBD, and performs a hypothesis test between individuals in the first cohort against everyone in the second cohort. If the hypothesis test is rejected, we remove the individual from the cohort, recompute the common ancestor, and recurse on the remaining individual and the new cohort. Essentially, we account for recombination rates on top of Bonsai's hypothesis tests computations. Our algorithms are evaluated against simulations of an endogamous Amish population to determine their efficacy in removing false positive IBD segments.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3535508.3545104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Segments of DNA that are inherited from a common ancestor are referred to as identical-by-descent (IBD). Because these segments are inherited, they not only allow us to study population characteristics and the sharing of rare variants but also understand the hidden familial relationships within populations. Over the past two decades, various IBD finding algorithms have been developed using hidden Markov model (HMM), hashing and extension, and Burrows-Wheeler Transform (BWT) approaches. In this study, we investigate the utility of pedigree information in enhancing the efficacy of IBD finding methods for endogamous populations. With the increasing prevalence of computationally efficient sequencing technology and proper documentation of pedigree structures, we expect complete pedigree information to become readily available for more populations. While IBD segments have been used to reconstruct pedigrees [1], because we now have access to the pedigree, it is a natural question to ask if including pedigree information would substantially improve IBD segment finding for the purpose of studying inheritance. Our contributions center around the proposition of two types of IBD finding algorithms for reducing the number of false positives in the detected IBD segments. Both methods analyze the familial relationships between cohorts of individuals who are initially hypothesized to share IBD segments. Our first algorithm is inspired by a k-nearest neighbors (KNN) algorithm [2] where we perform outlier detection on the cohort of IBD-sharing individuals. The metric for proximity is determined by the kinship coefficient evaluated from the pairwise relationships between individuals from the cohort. Our second algorithm is inspired by the Bonsai algorithm [3] and uses multiple hypothesis tests to evaluate if an individual has much more IBD than is expected by chance. Bonsai IBD detection algorithm first divides the pedigree into multiple cohorts of family members with no shared individuals, proceeds to pick the two cohorts with the most shared IBD, and performs a hypothesis test between individuals in the first cohort against everyone in the second cohort. If the hypothesis test is rejected, we remove the individual from the cohort, recompute the common ancestor, and recurse on the remaining individual and the new cohort. Essentially, we account for recombination rates on top of Bonsai's hypothesis tests computations. Our algorithms are evaluated against simulations of an endogamous Amish population to determine their efficacy in removing false positive IBD segments.
基于同族血统(IBD)的内婚制种群片段查找方法的比较
从共同祖先遗传的DNA片段被称为同血统(IBD)。由于这些片段是遗传的,它们不仅使我们能够研究种群特征和罕见变异的共享,而且还可以了解种群中隐藏的家族关系。在过去的二十年中,使用隐马尔可夫模型(HMM)、散列和扩展以及Burrows-Wheeler变换(BWT)方法开发了各种IBD查找算法。在这项研究中,我们研究了谱系信息在提高内婚制种群IBD发现方法有效性方面的效用。随着计算效率测序技术的日益普及和系谱结构的适当记录,我们期望完整的系谱信息可以为更多的人群提供。虽然IBD片段已被用于重建谱系[1],但由于我们现在可以访问谱系,因此很自然地要问,包括谱系信息是否会大大改善IBD片段的发现,以研究遗传。我们的贡献主要围绕两种类型的IBD发现算法的命题,用于减少检测到的IBD片段中的假阳性数量。这两种方法都分析了最初假设共享IBD片段的个体队列之间的家族关系。我们的第一个算法受到k近邻(KNN)算法[2]的启发,我们对ibd共享个体队列进行离群值检测。接近度的度量是由从队列中个体之间的成对关系中评估的亲属系数决定的。我们的第二个算法受到盆景算法[3]的启发,并使用多个假设检验来评估一个人的IBD是否比偶然预期的要多。Bonsai IBD检测算法首先将谱系划分为多个没有共享个体的家族成员队列,然后选出两个共享IBD最多的队列,对第一个队列中的个体与第二个队列中的所有个体进行假设检验。如果假设检验被拒绝,我们将个体从队列中移除,重新计算共同祖先,并对剩余个体和新队列进行递归。从本质上讲,我们在Bonsai的假设检验计算的基础上考虑了重组率。我们的算法通过模拟内婚的阿米什人来评估,以确定其去除假阳性IBD片段的功效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信