{"title":"基于同族血统(IBD)的内婚制种群片段查找方法的比较","authors":"Huyen T. Dang, Shi Jie Samuel Tan, Sara Mathieson","doi":"10.1145/3535508.3545104","DOIUrl":null,"url":null,"abstract":"Segments of DNA that are inherited from a common ancestor are referred to as identical-by-descent (IBD). Because these segments are inherited, they not only allow us to study population characteristics and the sharing of rare variants but also understand the hidden familial relationships within populations. Over the past two decades, various IBD finding algorithms have been developed using hidden Markov model (HMM), hashing and extension, and Burrows-Wheeler Transform (BWT) approaches. In this study, we investigate the utility of pedigree information in enhancing the efficacy of IBD finding methods for endogamous populations. With the increasing prevalence of computationally efficient sequencing technology and proper documentation of pedigree structures, we expect complete pedigree information to become readily available for more populations. While IBD segments have been used to reconstruct pedigrees [1], because we now have access to the pedigree, it is a natural question to ask if including pedigree information would substantially improve IBD segment finding for the purpose of studying inheritance. Our contributions center around the proposition of two types of IBD finding algorithms for reducing the number of false positives in the detected IBD segments. Both methods analyze the familial relationships between cohorts of individuals who are initially hypothesized to share IBD segments. Our first algorithm is inspired by a k-nearest neighbors (KNN) algorithm [2] where we perform outlier detection on the cohort of IBD-sharing individuals. The metric for proximity is determined by the kinship coefficient evaluated from the pairwise relationships between individuals from the cohort. Our second algorithm is inspired by the Bonsai algorithm [3] and uses multiple hypothesis tests to evaluate if an individual has much more IBD than is expected by chance. Bonsai IBD detection algorithm first divides the pedigree into multiple cohorts of family members with no shared individuals, proceeds to pick the two cohorts with the most shared IBD, and performs a hypothesis test between individuals in the first cohort against everyone in the second cohort. If the hypothesis test is rejected, we remove the individual from the cohort, recompute the common ancestor, and recurse on the remaining individual and the new cohort. Essentially, we account for recombination rates on top of Bonsai's hypothesis tests computations. Our algorithms are evaluated against simulations of an endogamous Amish population to determine their efficacy in removing false positive IBD segments.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of cohort-based identical-by-descent (IBD) segment finding methods for endogamous populations\",\"authors\":\"Huyen T. Dang, Shi Jie Samuel Tan, Sara Mathieson\",\"doi\":\"10.1145/3535508.3545104\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Segments of DNA that are inherited from a common ancestor are referred to as identical-by-descent (IBD). Because these segments are inherited, they not only allow us to study population characteristics and the sharing of rare variants but also understand the hidden familial relationships within populations. Over the past two decades, various IBD finding algorithms have been developed using hidden Markov model (HMM), hashing and extension, and Burrows-Wheeler Transform (BWT) approaches. In this study, we investigate the utility of pedigree information in enhancing the efficacy of IBD finding methods for endogamous populations. With the increasing prevalence of computationally efficient sequencing technology and proper documentation of pedigree structures, we expect complete pedigree information to become readily available for more populations. While IBD segments have been used to reconstruct pedigrees [1], because we now have access to the pedigree, it is a natural question to ask if including pedigree information would substantially improve IBD segment finding for the purpose of studying inheritance. Our contributions center around the proposition of two types of IBD finding algorithms for reducing the number of false positives in the detected IBD segments. Both methods analyze the familial relationships between cohorts of individuals who are initially hypothesized to share IBD segments. Our first algorithm is inspired by a k-nearest neighbors (KNN) algorithm [2] where we perform outlier detection on the cohort of IBD-sharing individuals. The metric for proximity is determined by the kinship coefficient evaluated from the pairwise relationships between individuals from the cohort. Our second algorithm is inspired by the Bonsai algorithm [3] and uses multiple hypothesis tests to evaluate if an individual has much more IBD than is expected by chance. Bonsai IBD detection algorithm first divides the pedigree into multiple cohorts of family members with no shared individuals, proceeds to pick the two cohorts with the most shared IBD, and performs a hypothesis test between individuals in the first cohort against everyone in the second cohort. If the hypothesis test is rejected, we remove the individual from the cohort, recompute the common ancestor, and recurse on the remaining individual and the new cohort. Essentially, we account for recombination rates on top of Bonsai's hypothesis tests computations. Our algorithms are evaluated against simulations of an endogamous Amish population to determine their efficacy in removing false positive IBD segments.\",\"PeriodicalId\":354504,\"journal\":{\"name\":\"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics\",\"volume\":\"66 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3535508.3545104\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3535508.3545104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparison of cohort-based identical-by-descent (IBD) segment finding methods for endogamous populations
Segments of DNA that are inherited from a common ancestor are referred to as identical-by-descent (IBD). Because these segments are inherited, they not only allow us to study population characteristics and the sharing of rare variants but also understand the hidden familial relationships within populations. Over the past two decades, various IBD finding algorithms have been developed using hidden Markov model (HMM), hashing and extension, and Burrows-Wheeler Transform (BWT) approaches. In this study, we investigate the utility of pedigree information in enhancing the efficacy of IBD finding methods for endogamous populations. With the increasing prevalence of computationally efficient sequencing technology and proper documentation of pedigree structures, we expect complete pedigree information to become readily available for more populations. While IBD segments have been used to reconstruct pedigrees [1], because we now have access to the pedigree, it is a natural question to ask if including pedigree information would substantially improve IBD segment finding for the purpose of studying inheritance. Our contributions center around the proposition of two types of IBD finding algorithms for reducing the number of false positives in the detected IBD segments. Both methods analyze the familial relationships between cohorts of individuals who are initially hypothesized to share IBD segments. Our first algorithm is inspired by a k-nearest neighbors (KNN) algorithm [2] where we perform outlier detection on the cohort of IBD-sharing individuals. The metric for proximity is determined by the kinship coefficient evaluated from the pairwise relationships between individuals from the cohort. Our second algorithm is inspired by the Bonsai algorithm [3] and uses multiple hypothesis tests to evaluate if an individual has much more IBD than is expected by chance. Bonsai IBD detection algorithm first divides the pedigree into multiple cohorts of family members with no shared individuals, proceeds to pick the two cohorts with the most shared IBD, and performs a hypothesis test between individuals in the first cohort against everyone in the second cohort. If the hypothesis test is rejected, we remove the individual from the cohort, recompute the common ancestor, and recurse on the remaining individual and the new cohort. Essentially, we account for recombination rates on top of Bonsai's hypothesis tests computations. Our algorithms are evaluated against simulations of an endogamous Amish population to determine their efficacy in removing false positive IBD segments.