Viral genome sequence datasets display pervasive evidence of strand-specific substitution biases that are best described using non-reversible nucleotide substitution models.

Rita Sianga-Mete, Penelope Hartnady, Wimbai Caroline Mandikumba, Kayleigh Rutherford, Christopher Brian Currin, Florence Phelanyane, Sabina Stefan, Sergei L Kosakovsky Pond, Darren Patrick Martin
{"title":"Viral genome sequence datasets display pervasive evidence of strand-specific substitution biases that are best described using non-reversible nucleotide substitution models.","authors":"Rita Sianga-Mete, Penelope Hartnady, Wimbai Caroline Mandikumba, Kayleigh Rutherford, Christopher Brian Currin, Florence Phelanyane, Sabina Stefan, Sergei L Kosakovsky Pond, Darren Patrick Martin","doi":"10.21203/rs.3.rs-2407778/v2","DOIUrl":null,"url":null,"abstract":"<p><p>Most phylogenetic trees are inferred using time-reversible evolutionary models that assume that the relative rates of substitution for any given pair of nucleotides are the same regardless of the direction of the substitutions. However, there is no reason to assume that the underlying biochemical mutational processes that cause substitutions are similarly symmetrical. We consider two non-reversible nucleotide substitution models: (1) a 6-rate non-reversible model (NREV6) that is applicable to analyzing mutational processes in double-stranded genomes in that complementary substitutions occur at identical rates; and (2) a 12-rate non-reversible model (NREV12) that is applicable to analyzing mutational processes in single-stranded (ss) genomes in that all substitution types are free to occur at different rates. Using likelihood ratio and Akaike Information Criterion-based model tests, we show that, surprisingly, NREV12 provided a significantly better fit than the General Time Reversible (GTR) and NREV6 models to 21/31 dsRNA and 20/30 dsDNA datasets. As expected, however, NREV12 provided a significantly better fit to 24/33 ssDNA and 40/47 ssRNA datasets. We tested how non-reversibility impacts the accuracy with which phylogenetic trees are inferred. As simulated degrees of non-reversibility (DNR) increased, the tree topology inferences using both NREV12 and GTR became more accurate, whereas inferred tree branch lengths became less accurate. We conclude that while non-reversible models should be helpful in the analysis of mutational processes in most virus species, there is no pressing need to use these models for routine phylogenetic inference.</p>","PeriodicalId":21039,"journal":{"name":"Research Square","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9810213/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research Square","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21203/rs.3.rs-2407778/v2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Most phylogenetic trees are inferred using time-reversible evolutionary models that assume that the relative rates of substitution for any given pair of nucleotides are the same regardless of the direction of the substitutions. However, there is no reason to assume that the underlying biochemical mutational processes that cause substitutions are similarly symmetrical. We consider two non-reversible nucleotide substitution models: (1) a 6-rate non-reversible model (NREV6) that is applicable to analyzing mutational processes in double-stranded genomes in that complementary substitutions occur at identical rates; and (2) a 12-rate non-reversible model (NREV12) that is applicable to analyzing mutational processes in single-stranded (ss) genomes in that all substitution types are free to occur at different rates. Using likelihood ratio and Akaike Information Criterion-based model tests, we show that, surprisingly, NREV12 provided a significantly better fit than the General Time Reversible (GTR) and NREV6 models to 21/31 dsRNA and 20/30 dsDNA datasets. As expected, however, NREV12 provided a significantly better fit to 24/33 ssDNA and 40/47 ssRNA datasets. We tested how non-reversibility impacts the accuracy with which phylogenetic trees are inferred. As simulated degrees of non-reversibility (DNR) increased, the tree topology inferences using both NREV12 and GTR became more accurate, whereas inferred tree branch lengths became less accurate. We conclude that while non-reversible models should be helpful in the analysis of mutational processes in most virus species, there is no pressing need to use these models for routine phylogenetic inference.

Abstract Image

Abstract Image

病毒基因组序列数据集显示了链特异性取代偏差的普遍证据,这些证据最好使用不可逆核苷酸取代模型来描述。
背景:绝大多数系统发育树是使用时间可逆进化模型从分子序列数据(核苷酸或氨基酸)推断出来的,该模型假设,对于任何一对核苷酸或氨基酸特征,X到Y的相对取代率与Y到X的相对取代速率相同。然而,这种可逆性假设不太可能准确反映导致取代固定的实际潜在生化和/或进化过程。在这里,我们使用经验病毒基因组序列数据来揭示进化的不可逆性在大多数病毒群中普遍存在。明确地我们考虑了两个不可逆的核苷酸取代模型:(1)6速率不可逆模型(NREV6),其中Watson-Crick互补取代以相同的相对速率发生,因此它可能最适用于分析基因组的进化,其中两个互补链都经历相同的突变过程(例如双链(ds)RNA或dsDNA基因组);和(2)12速率不可逆模型(NREV12),其中所有相对取代类型都可以以不同的速率自由发生,因此可能适用于分析互补基因组链经历不同突变过程的基因组进化(例如具有单链(ss)RNA或ssDNA基因组的病毒可能预期的)似然比和基于Akaike信息标准的模型测试,我们发现,令人惊讶的是,NREV12对21/31 dsRNA和20/30 dsDNA数据集的拟合明显好于一般时间可逆(GTR)和NREV6模型,其中NREV6仅在5/30 dsDNA和2/31 dsRNA数据集中提供了比NREV12和GTR更好的拟合。正如预期的那样,NREV12对24/33 ssDNA和40/47 ssRNA数据集提供了明显更好的拟合。接下来,我们使用模拟来表明,无论GTR还是NREV12用于描述突变过程,链特异性取代偏差程度的增加都会降低系统发育推断的准确性。然而,在链特异性取代偏差极端的情况下(如在严重急性呼吸系统综合征冠状病毒2型和Torque teno-sus病毒数据集中),NREV12往往比使用GTR获得的系统发育树更准确。结论:我们表明,在涉及病毒基因组序列的系统发育分析的模型选择阶段,应该认真考虑NREV12。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信