等位基因年龄提供的负选择强度信息有限

Vivaswat Shastry, Jeremy J. Berg
{"title":"等位基因年龄提供的负选择强度信息有限","authors":"Vivaswat Shastry, Jeremy J. Berg","doi":"10.1101/2024.08.06.606888","DOIUrl":null,"url":null,"abstract":"For many problems in population genetics, it is useful to characterize the distribution of fitness effects (DFE) of <em>de novo</em> mutations among a certain class of sites. A DFE is typically estimated by fitting an observed site frequency spectrum (SFS) to an expected SFS given a hypothesized distribution of selection coefficients and demographic history. The development of tools to infer gene trees from haplotype alignments, along with ancient DNA resources, provides us with additional information about the frequency trajectories of segregating mutations. Here, we ask how useful this additional information is for learning about the DFE, using the joint distribution on allele frequency and age to summarize information about the trajectory. To this end, we introduce an accurate and efficient numerical method for computing the density on the age of a segregating variant found at a given sample frequency, given the strength of selection and an arbitrarily complex population size history. We then use this framework to show that the unconditional age distribution of negatively selected alleles is very closely approximated by re-weighting the neutral age distribution in terms of the negatively selected SFS, suggesting that allele ages provide very little information about the DFE beyond that already contained in the present day frequency. To confirm this prediction, we extended the standard Poisson Random Field (PRF) method to incorporate the joint distribution of frequency and age in estimating selection coefficients, and test its performance using simulations. We find that when the full SFS is observed and the true allele ages are known, including ages in the estimation provides only small increases in the accuracy of estimated selection coefficients. However, if only sites with frequencies above a certain threshold are observed, then the true ages can provide substantial information about the selection coefficients, especially when the selection coefficient is large. When ages are estimated from haplotype data using state-of-the-art tools, uncertainty about the age abrogates most of the additional information in the fully observed SFS case, while the neutral prior assumed in these tools when estimating ages induces a downward bias in the case of the thresholded SFS.","PeriodicalId":501183,"journal":{"name":"bioRxiv - Evolutionary Biology","volume":"47 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Allele ages provide limited information about the strength of negative selection\",\"authors\":\"Vivaswat Shastry, Jeremy J. Berg\",\"doi\":\"10.1101/2024.08.06.606888\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For many problems in population genetics, it is useful to characterize the distribution of fitness effects (DFE) of <em>de novo</em> mutations among a certain class of sites. A DFE is typically estimated by fitting an observed site frequency spectrum (SFS) to an expected SFS given a hypothesized distribution of selection coefficients and demographic history. The development of tools to infer gene trees from haplotype alignments, along with ancient DNA resources, provides us with additional information about the frequency trajectories of segregating mutations. Here, we ask how useful this additional information is for learning about the DFE, using the joint distribution on allele frequency and age to summarize information about the trajectory. To this end, we introduce an accurate and efficient numerical method for computing the density on the age of a segregating variant found at a given sample frequency, given the strength of selection and an arbitrarily complex population size history. We then use this framework to show that the unconditional age distribution of negatively selected alleles is very closely approximated by re-weighting the neutral age distribution in terms of the negatively selected SFS, suggesting that allele ages provide very little information about the DFE beyond that already contained in the present day frequency. To confirm this prediction, we extended the standard Poisson Random Field (PRF) method to incorporate the joint distribution of frequency and age in estimating selection coefficients, and test its performance using simulations. We find that when the full SFS is observed and the true allele ages are known, including ages in the estimation provides only small increases in the accuracy of estimated selection coefficients. However, if only sites with frequencies above a certain threshold are observed, then the true ages can provide substantial information about the selection coefficients, especially when the selection coefficient is large. When ages are estimated from haplotype data using state-of-the-art tools, uncertainty about the age abrogates most of the additional information in the fully observed SFS case, while the neutral prior assumed in these tools when estimating ages induces a downward bias in the case of the thresholded SFS.\",\"PeriodicalId\":501183,\"journal\":{\"name\":\"bioRxiv - Evolutionary Biology\",\"volume\":\"47 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"bioRxiv - Evolutionary Biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2024.08.06.606888\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Evolutionary Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.06.606888","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

对于种群遗传学中的许多问题来说,描述某一类位点中新突变的适存效应(DFE)分布是非常有用的。DFE 通常是通过将观察到的位点频谱(SFS)与假设的选择系数和人口历史分布情况下的预期频谱(SFS)进行拟合来估算的。通过单倍型排列推断基因树的工具以及古 DNA 资源的开发,为我们提供了有关分离突变频率轨迹的额外信息。在此,我们将利用等位基因频率和年龄的联合分布来总结有关轨迹的信息,并询问这些额外信息对于了解 DFE 有多大用处。为此,我们介绍了一种精确而高效的数值方法,用于计算在给定样本频率下发现的分离变异的年龄密度,该方法给定了选择强度和任意复杂的种群规模历史。然后,我们利用这一框架证明,通过对负选择 SFS 的中性年龄分布重新加权,可以非常近似地得出负选择等位基因的无条件年龄分布,这表明等位基因的年龄除了已经包含在当前频率中的信息外,几乎不能提供任何有关 DFE 的信息。为了证实这一预测,我们扩展了标准泊松随机场(PRF)方法,在估计选择系数时纳入了频率和年龄的联合分布,并通过模拟测试了该方法的性能。我们发现,当观测到完整的 SFS 且已知真实等位基因年龄时,将年龄纳入估计只会使估计选择系数的准确性略有提高。然而,如果只观察到频率高于某个阈值的位点,那么真实年龄就能提供有关选择系数的大量信息,尤其是当选择系数较大时。当使用最先进的工具从单倍型数据中估计年龄时,年龄的不确定性会在完全观测到 SFS 的情况下放弃大部分额外信息,而在阈值 SFS 的情况下,这些工具在估计年龄时假设的中性先验会导致向下偏差。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Allele ages provide limited information about the strength of negative selection
For many problems in population genetics, it is useful to characterize the distribution of fitness effects (DFE) of de novo mutations among a certain class of sites. A DFE is typically estimated by fitting an observed site frequency spectrum (SFS) to an expected SFS given a hypothesized distribution of selection coefficients and demographic history. The development of tools to infer gene trees from haplotype alignments, along with ancient DNA resources, provides us with additional information about the frequency trajectories of segregating mutations. Here, we ask how useful this additional information is for learning about the DFE, using the joint distribution on allele frequency and age to summarize information about the trajectory. To this end, we introduce an accurate and efficient numerical method for computing the density on the age of a segregating variant found at a given sample frequency, given the strength of selection and an arbitrarily complex population size history. We then use this framework to show that the unconditional age distribution of negatively selected alleles is very closely approximated by re-weighting the neutral age distribution in terms of the negatively selected SFS, suggesting that allele ages provide very little information about the DFE beyond that already contained in the present day frequency. To confirm this prediction, we extended the standard Poisson Random Field (PRF) method to incorporate the joint distribution of frequency and age in estimating selection coefficients, and test its performance using simulations. We find that when the full SFS is observed and the true allele ages are known, including ages in the estimation provides only small increases in the accuracy of estimated selection coefficients. However, if only sites with frequencies above a certain threshold are observed, then the true ages can provide substantial information about the selection coefficients, especially when the selection coefficient is large. When ages are estimated from haplotype data using state-of-the-art tools, uncertainty about the age abrogates most of the additional information in the fully observed SFS case, while the neutral prior assumed in these tools when estimating ages induces a downward bias in the case of the thresholded SFS.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信