快速模拟身份的下降段。

IF 2.2 4区数学 Q2 BIOLOGY

Bulletin of Mathematical Biology Pub Date : 2025-05-23 DOI:10.1007/s11538-025-01464-8

Seth D Temple, Sharon R Browning, Elizabeth A Thompson

{"title":"快速模拟身份的下降段。","authors":"Seth D Temple, Sharon R Browning, Elizabeth A Thompson","doi":"10.1007/s11538-025-01464-8","DOIUrl":null,"url":null,"abstract":"The worst-case runtime complexity to simulate haplotype segments identical by descent (IBD) is quadratic in sample size. We propose two main techniques to reduce the compute time, both of which are motivated by coalescent and recombination processes. We provide mathematical results that explain why our algorithm should outperform a naive implementation with high probability. In our experiments, we observe average compute times to simulate detectable IBD segments around a locus that scale approximately linearly in sample size and take a couple of seconds for sample sizes that are less than 10,000 diploid individuals. In contrast, we find that existing methods to simulate IBD segments take minutes to hours for sample sizes exceeding a few thousand diploid individuals. When using IBD segments to study recent positive selection around a locus, our efficient simulation algorithm makes feasible statistical inferences, e.g., parametric bootstrapping in analyses of large biobanks, that would be otherwise intractable.","PeriodicalId":9372,"journal":{"name":"Bulletin of Mathematical Biology","volume":"87 7","pages":"84"},"PeriodicalIF":2.2000,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12102126/pdf/","citationCount":"0","resultStr":"{\"title\":\"Fast simulation of identity-by-descent segments.\",\"authors\":\"Seth D Temple, Sharon R Browning, Elizabeth A Thompson\",\"doi\":\"10.1007/s11538-025-01464-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The worst-case runtime complexity to simulate haplotype segments identical by descent (IBD) is quadratic in sample size. We propose two main techniques to reduce the compute time, both of which are motivated by coalescent and recombination processes. We provide mathematical results that explain why our algorithm should outperform a naive implementation with high probability. In our experiments, we observe average compute times to simulate detectable IBD segments around a locus that scale approximately linearly in sample size and take a couple of seconds for sample sizes that are less than 10,000 diploid individuals. In contrast, we find that existing methods to simulate IBD segments take minutes to hours for sample sizes exceeding a few thousand diploid individuals. When using IBD segments to study recent positive selection around a locus, our efficient simulation algorithm makes feasible statistical inferences, e.g., parametric bootstrapping in analyses of large biobanks, that would be otherwise intractable.\",\"PeriodicalId\":9372,\"journal\":{\"name\":\"Bulletin of Mathematical Biology\",\"volume\":\"87 7\",\"pages\":\"84\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2025-05-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12102126/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bulletin of Mathematical Biology\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1007/s11538-025-01464-8\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bulletin of Mathematical Biology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s11538-025-01464-8","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

在最坏情况下，模拟单倍型同源片段（IBD）的运行复杂度在样本量上是二次的。我们提出了两种主要的技术来减少计算时间，这两种技术都是由聚结和重组过程驱动的。我们提供了数学结果来解释为什么我们的算法应该以高概率优于朴素的实现。在我们的实验中，我们观察到模拟一个基因座周围可检测的IBD片段的平均计算时间，该基因座在样本量上近似线性缩放，对于样本量小于10,000个二倍体个体，需要几秒钟的时间。相比之下，我们发现现有的方法来模拟IBD片段需要几分钟到几个小时的样本量超过几千二倍体个体。当使用IBD片段研究基因座周围最近的正选择时，我们高效的模拟算法可以做出可行的统计推断，例如，在大型生物库的分析中进行参数自举，否则这将是棘手的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Fast simulation of identity-by-descent segments.

查看原文本刊更多论文

Fast simulation of identity-by-descent segments.

The worst-case runtime complexity to simulate haplotype segments identical by descent (IBD) is quadratic in sample size. We propose two main techniques to reduce the compute time, both of which are motivated by coalescent and recombination processes. We provide mathematical results that explain why our algorithm should outperform a naive implementation with high probability. In our experiments, we observe average compute times to simulate detectable IBD segments around a locus that scale approximately linearly in sample size and take a couple of seconds for sample sizes that are less than 10,000 diploid individuals. In contrast, we find that existing methods to simulate IBD segments take minutes to hours for sample sizes exceeding a few thousand diploid individuals. When using IBD segments to study recent positive selection around a locus, our efficient simulation algorithm makes feasible statistical inferences, e.g., parametric bootstrapping in analyses of large biobanks, that would be otherwise intractable.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Bulletin of Mathematical Biology 生物-生物学

CiteScore

3.90

自引率

8.60%

发文量

123

审稿时长

7.5 months

期刊介绍： The Bulletin of Mathematical Biology, the official journal of the Society for Mathematical Biology, disseminates original research findings and other information relevant to the interface of biology and the mathematical sciences. Contributions should have relevance to both fields. In order to accommodate the broad scope of new developments, the journal accepts a variety of contributions, including: Original research articles focused on new biological insights gained with the help of tools from the mathematical sciences or new mathematical tools and methods with demonstrated applicability to biological investigations Research in mathematical biology education Reviews Commentaries Perspectives, and contributions that discuss issues important to the profession All contributions are peer-reviewed.