Reduced Amino Acid Substitution Matrices Find Traces of Ancient Coding Alphabets in Modern Day Proteins.

IF 5.3 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular biology and evolution Pub Date : 2025-09-01 DOI:10.1093/molbev/msaf197

Jordan Douglas, Remco Bouckaert, Charles W Carter, Peter R Wills

{"title":"Reduced Amino Acid Substitution Matrices Find Traces of Ancient Coding Alphabets in Modern Day Proteins.","authors":"Jordan Douglas, Remco Bouckaert, Charles W Carter, Peter R Wills","doi":"10.1093/molbev/msaf197","DOIUrl":null,"url":null,"abstract":"<p><p>All known living systems make proteins from the same 20 canonically coded amino acids, but this was not always the case. Early genetic coding systems likely operated with a restricted pool of amino acid types and limited means to distinguish between them. Despite this, amino acid substitution models like LG and WAG all assume a constant coding alphabet over time. That makes them especially inappropriate for the aminoacyl-tRNA synthetases (aaRS)-the enzymes that govern translation. To address this limitation, we created a class of substitution models that account for evolutionary changes in the coding alphabet size by defining the transition from 19 states in a past epoch to 20 now. We use a Bayesian phylogenetic framework to improve phylogeny estimation and testing of this two-alphabet hypothesis. The hypothesis was strongly rejected by datasets composed exclusively of \"young\" eukaryotic proteins. It was generally supported by \"old\" (aaRS and non-aaRS) proteins whose origins date from before the last universal common ancestor. Standard methods overestimate the divergence ages of proteins that originated under reduced coding alphabets in both simulated and aaRS alignments. The new model provides a timeline slightly more consistent with the Earth's history. Our findings suggest that aaRS functional bifurcation events can explain much of the genetic code's evolution, but there remain other unknown forces at play too. This work provides a robust, seamless framework for reconstructing phylogenies from ancient protein datasets and offers further insights into the dawn of molecular biology.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12402984/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular biology and evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/molbev/msaf197","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

All known living systems make proteins from the same 20 canonically coded amino acids, but this was not always the case. Early genetic coding systems likely operated with a restricted pool of amino acid types and limited means to distinguish between them. Despite this, amino acid substitution models like LG and WAG all assume a constant coding alphabet over time. That makes them especially inappropriate for the aminoacyl-tRNA synthetases (aaRS)-the enzymes that govern translation. To address this limitation, we created a class of substitution models that account for evolutionary changes in the coding alphabet size by defining the transition from 19 states in a past epoch to 20 now. We use a Bayesian phylogenetic framework to improve phylogeny estimation and testing of this two-alphabet hypothesis. The hypothesis was strongly rejected by datasets composed exclusively of "young" eukaryotic proteins. It was generally supported by "old" (aaRS and non-aaRS) proteins whose origins date from before the last universal common ancestor. Standard methods overestimate the divergence ages of proteins that originated under reduced coding alphabets in both simulated and aaRS alignments. The new model provides a timeline slightly more consistent with the Earth's history. Our findings suggest that aaRS functional bifurcation events can explain much of the genetic code's evolution, but there remain other unknown forces at play too. This work provides a robust, seamless framework for reconstructing phylogenies from ancient protein datasets and offers further insights into the dawn of molecular biology.

Abstract Image

查看原文本刊更多论文

减少的氨基酸替代矩阵在现代蛋白质中发现了古代编码字母的痕迹。

所有已知的生命系统都是由相同的20种标准编码氨基酸制造蛋白质，但情况并非总是如此。早期的遗传编码系统很可能是在有限的氨基酸类型池中运作的，而且区分它们的手段也有限。尽管如此，像LG和WAG这样的氨基酸替代模型都假设随着时间的推移编码字母表是恒定的。这使得它们特别不适合用于控制翻译的氨基酰基trna合成酶（aaRS）。为了解决这一限制，我们创建了一类替代模型，通过定义从过去的19个状态到现在的20个状态的转换，来解释编码字母表大小的进化变化。我们使用贝叶斯系统发育框架来改进这两个字母假设的系统发育估计和测试。这一假设被完全由“年轻”真核蛋白组成的数据集强烈否定。它通常由“古老的”（aaRS和非aaRS）蛋白质支持，这些蛋白质的起源可以追溯到最后一个普遍的共同祖先之前。标准方法高估了在模拟序列和aaRS序列中源自简化编码字母的蛋白质的分化年龄。新模型提供了一个与地球历史稍微一致的时间轴。我们的研究结果表明，aaRS的功能分岔事件可以解释遗传密码的大部分进化，但仍有其他未知的力量在起作用。这项工作为从古老的蛋白质数据集重建系统发育提供了一个强大的、无缝的框架，并为分子生物学的曙光提供了进一步的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Molecular biology and evolution 生物-进化生物学

CiteScore

19.70

自引率

3.70%

发文量

257

审稿时长

1 months

期刊介绍： Molecular Biology and Evolution Journal Overview: Publishes research at the interface of molecular (including genomics) and evolutionary biology Considers manuscripts containing patterns, processes, and predictions at all levels of organization: population, taxonomic, functional, and phenotypic Interested in fundamental discoveries, new and improved methods, resources, technologies, and theories advancing evolutionary research Publishes balanced reviews of recent developments in genome evolution and forward-looking perspectives suggesting future directions in molecular evolution applications.