蛋白质进化的经验替代模型：数据库、关系和建模考虑。

IF 3.6 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation Pub Date : 2025-01-18 DOI:10.1093/database/baaf052

Paula Iglesias-Rivas, Roberto Del Amparo, Javier A Cabaleiro, Miguel Arenas

{"title":"蛋白质进化的经验替代模型：数据库、关系和建模考虑。","authors":"Paula Iglesias-Rivas, Roberto Del Amparo, Javier A Cabaleiro, Miguel Arenas","doi":"10.1093/database/baaf052","DOIUrl":null,"url":null,"abstract":"Substitution models of protein evolution describe the patterns of amino acid substitutions over evolutionary time and are fundamental for probabilistic methods of phylogenetic inference. At the protein level, a variety of substitution models are available, but only empirical substitution models are well established in phylogenetics due to their mathematical simplicity. Despite their importance, a database compiling the large number of currently available empirical substitution models of protein evolution is lacking, although such a resource could facilitate access, assessment, and subsequent implementation of these models into phylogenetic frameworks. Besides, little is known about formal comparisons between the current set of empirical substitution models. We present EModelDB, a database of empirical substitution models of protein evolution required for probabilistic protein phylogenetics that includes the corresponding exchangeability matrices, model classification, and model-specific biological information. The database is integrated into a graphical user interface, written in Python and SQL, that facilitates its usability. We also compared common empirical substitution models in terms of the distance between their relative rates of amino acid substitution and amino frequencies at equilibrium. We found that substitution models derived from proteins related in nature tend to cluster together, reflecting similar evolutionary patterns. Indeed, we evaluated the empirical substitution models in terms of the folding stability of the derived modeled proteins and found that they generally produce less stable proteins compared to real proteins, suggesting that substitution models with additional evolutionary constraints can be preferred for studying protein evolution accounting for folding stability. Database URL: https://github.com/Paula-Iglesias-Rivas/EModelDB.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12462380/pdf/","citationCount":"0","resultStr":"{\"title\":\"Empirical substitution models of protein evolution: database, relationships, and modeling considerations.\",\"authors\":\"Paula Iglesias-Rivas, Roberto Del Amparo, Javier A Cabaleiro, Miguel Arenas\",\"doi\":\"10.1093/database/baaf052\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Substitution models of protein evolution describe the patterns of amino acid substitutions over evolutionary time and are fundamental for probabilistic methods of phylogenetic inference. At the protein level, a variety of substitution models are available, but only empirical substitution models are well established in phylogenetics due to their mathematical simplicity. Despite their importance, a database compiling the large number of currently available empirical substitution models of protein evolution is lacking, although such a resource could facilitate access, assessment, and subsequent implementation of these models into phylogenetic frameworks. Besides, little is known about formal comparisons between the current set of empirical substitution models. We present EModelDB, a database of empirical substitution models of protein evolution required for probabilistic protein phylogenetics that includes the corresponding exchangeability matrices, model classification, and model-specific biological information. The database is integrated into a graphical user interface, written in Python and SQL, that facilitates its usability. We also compared common empirical substitution models in terms of the distance between their relative rates of amino acid substitution and amino frequencies at equilibrium. We found that substitution models derived from proteins related in nature tend to cluster together, reflecting similar evolutionary patterns. Indeed, we evaluated the empirical substitution models in terms of the folding stability of the derived modeled proteins and found that they generally produce less stable proteins compared to real proteins, suggesting that substitution models with additional evolutionary constraints can be preferred for studying protein evolution accounting for folding stability. Database URL: https://github.com/Paula-Iglesias-Rivas/EModelDB.\",\"PeriodicalId\":10923,\"journal\":{\"name\":\"Database: The Journal of Biological Databases and Curation\",\"volume\":\"2025 \",\"pages\":\"\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2025-01-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12462380/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Database: The Journal of Biological Databases and Curation\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/database/baaf052\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Database: The Journal of Biological Databases and Curation","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/database/baaf052","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

蛋白质进化的替代模型描述了氨基酸在进化时间中的替代模式，是系统发育推断的概率方法的基础。在蛋白质水平上，有多种替代模型可用，但由于其数学简单，只有经验替代模型在系统发育中得到了很好的建立。尽管它们很重要，但目前缺乏一个汇编大量目前可用的蛋白质进化经验替代模型的数据库，尽管这样的资源可以促进这些模型在系统发育框架中的获取、评估和后续实施。此外，对现有的一组经验替代模型之间的形式比较知之甚少。我们提出了EModelDB，这是一个蛋白质进化的经验替代模型数据库，用于概率蛋白质系统发育，包括相应的可交换性矩阵、模型分类和模型特异性生物信息。数据库被集成到一个图形用户界面中，用Python和SQL编写，方便了它的可用性。我们还比较了常见的经验取代模型在它们的相对氨基酸取代率和氨基酸频率之间的距离平衡。我们发现从自然界相关蛋白质衍生的替代模型倾向于聚集在一起，反映了相似的进化模式。事实上，我们根据衍生模型蛋白质的折叠稳定性评估了经验替代模型，发现它们通常产生的蛋白质比真实蛋白质更不稳定，这表明具有额外进化约束的替代模型可以优先用于研究蛋白质进化对折叠稳定性的影响。数据库地址：https://github.com/Paula-Iglesias-Rivas/EModelDB。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Empirical substitution models of protein evolution: database, relationships, and modeling considerations.

查看原文本刊更多论文

Empirical substitution models of protein evolution: database, relationships, and modeling considerations.

Substitution models of protein evolution describe the patterns of amino acid substitutions over evolutionary time and are fundamental for probabilistic methods of phylogenetic inference. At the protein level, a variety of substitution models are available, but only empirical substitution models are well established in phylogenetics due to their mathematical simplicity. Despite their importance, a database compiling the large number of currently available empirical substitution models of protein evolution is lacking, although such a resource could facilitate access, assessment, and subsequent implementation of these models into phylogenetic frameworks. Besides, little is known about formal comparisons between the current set of empirical substitution models. We present EModelDB, a database of empirical substitution models of protein evolution required for probabilistic protein phylogenetics that includes the corresponding exchangeability matrices, model classification, and model-specific biological information. The database is integrated into a graphical user interface, written in Python and SQL, that facilitates its usability. We also compared common empirical substitution models in terms of the distance between their relative rates of amino acid substitution and amino frequencies at equilibrium. We found that substitution models derived from proteins related in nature tend to cluster together, reflecting similar evolutionary patterns. Indeed, we evaluated the empirical substitution models in terms of the folding stability of the derived modeled proteins and found that they generally produce less stable proteins compared to real proteins, suggesting that substitution models with additional evolutionary constraints can be preferred for studying protein evolution accounting for folding stability. Database URL: https://github.com/Paula-Iglesias-Rivas/EModelDB.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Database: The Journal of Biological Databases and Curation MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

9.00

自引率

3.40%

发文量

100

审稿时长

>12 weeks

期刊介绍： Huge volumes of primary data are archived in numerous open-access databases, and with new generation technologies becoming more common in laboratories, large datasets will become even more prevalent. The archiving, curation, analysis and interpretation of all of these data are a challenge. Database development and biocuration are at the forefront of the endeavor to make sense of this mounting deluge of data. Database: The Journal of Biological Databases and Curation provides an open access platform for the presentation of novel ideas in database research and biocuration, and aims to help strengthen the bridge between database developers, curators, and users.