Estimation of demography and mutation rates from one million haploid genomes.

IF 8.1 1区 生物学 Q1 GENETICS & HEREDITY
American journal of human genetics Pub Date : 2025-09-04 Epub Date: 2025-08-13 DOI:10.1016/j.ajhg.2025.07.008
Joshua G Schraiber, Jeffrey P Spence, Michael D Edge
{"title":"Estimation of demography and mutation rates from one million haploid genomes.","authors":"Joshua G Schraiber, Jeffrey P Spence, Michael D Edge","doi":"10.1016/j.ajhg.2025.07.008","DOIUrl":null,"url":null,"abstract":"<p><p>As genetic sequencing costs have plummeted, datasets with sizes previously unthinkable have begun to appear. Such datasets present opportunities to learn about evolutionary history, particularly via rare alleles that record the very recent past. However, beyond the computational challenges inherent in the analysis of many large-scale datasets, large population-genetic datasets present theoretical problems. In particular, the majority of population-genetic tools require the assumption that each mutant allele in the sample is the result of a single mutation (the \"infinite-sites\" assumption), which is violated in large samples. Here, we present DR EVIL, a method for estimating mutation rates and recent demographic history from very large samples. DR EVIL avoids the infinite-sites assumption by using a diffusion approximation to a branching-process model with recurrent mutation. This approach results in tractable likelihoods that are accurate for rare alleles. We show that DR EVIL performs well in simulations and apply it to rare-variant data from one million haploid samples. We identify mutation-rate heterogeneity even after accounting for trinucleotide context and methylation status. We also predict that at modern sample sizes, the alleles at most polymorphic sites with high mutation rates represent the descendants of multiple mutation events.</p>","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":" ","pages":"2152-2166"},"PeriodicalIF":8.1000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12461025/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American journal of human genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.ajhg.2025.07.008","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/13 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

As genetic sequencing costs have plummeted, datasets with sizes previously unthinkable have begun to appear. Such datasets present opportunities to learn about evolutionary history, particularly via rare alleles that record the very recent past. However, beyond the computational challenges inherent in the analysis of many large-scale datasets, large population-genetic datasets present theoretical problems. In particular, the majority of population-genetic tools require the assumption that each mutant allele in the sample is the result of a single mutation (the "infinite-sites" assumption), which is violated in large samples. Here, we present DR EVIL, a method for estimating mutation rates and recent demographic history from very large samples. DR EVIL avoids the infinite-sites assumption by using a diffusion approximation to a branching-process model with recurrent mutation. This approach results in tractable likelihoods that are accurate for rare alleles. We show that DR EVIL performs well in simulations and apply it to rare-variant data from one million haploid samples. We identify mutation-rate heterogeneity even after accounting for trinucleotide context and methylation status. We also predict that at modern sample sizes, the alleles at most polymorphic sites with high mutation rates represent the descendants of multiple mutation events.

估计100万个单倍体基因组的人口统计学和突变率。
随着基因测序成本的大幅下降,以前难以想象的数据集开始出现。这样的数据集提供了了解进化史的机会,特别是通过记录最近过去的罕见等位基因。然而,除了分析许多大规模数据集所固有的计算挑战之外,大型种群遗传数据集还存在理论问题。特别是,大多数群体遗传工具需要假设样本中的每个突变等位基因是单个突变的结果(“无限位点”假设),这在大样本中是违反的。在这里,我们提出DR EVIL,一种从非常大的样本中估计突变率和最近人口统计学历史的方法。DR EVIL通过对具有反复突变的分支过程模型使用扩散近似来避免无限位点假设。这种方法产生了对罕见等位基因精确的易处理的可能性。我们证明DR EVIL在模拟中表现良好,并将其应用于来自100万个单倍体样本的罕见变异数据。即使在考虑了三核苷酸背景和甲基化状态之后,我们也确定了突变率的异质性。我们还预测,在现代样本量下,大多数具有高突变率的多态性位点上的等位基因代表了多个突变事件的后代。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
14.70
自引率
4.10%
发文量
185
审稿时长
1 months
期刊介绍: The American Journal of Human Genetics (AJHG) is a monthly journal published by Cell Press, chosen by The American Society of Human Genetics (ASHG) as its premier publication starting from January 2008. AJHG represents Cell Press's first society-owned journal, and both ASHG and Cell Press anticipate significant synergies between AJHG content and that of other Cell Press titles.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信