The evolutionary capacity of protein structures

Proceedings of the eighth annual international conference on Research in computational molecular biology Pub Date : 2004-03-27 DOI:10.1145/974614.974653

Leonid Meyerguz, D. Kempe, J. Kleinberg, R. Elber

{"title":"The evolutionary capacity of protein structures","authors":"Leonid Meyerguz, D. Kempe, J. Kleinberg, R. Elber","doi":"10.1145/974614.974653","DOIUrl":null,"url":null,"abstract":"In nature, one finds large collections of different protein sequences exhibiting roughly the same three-dimensional structure, and this observation underpins the study of structural protein families. In studying such families at a global level, a natural question to ask is how close to \"optimal\" the native sequences are in terms of their energy. We therefore define and compute the evolutionary capacity of a protein structure as the total number of sequences whose energy in the structure is below that of the native sequence. An important aspect of our definition is that we consider the space of all possible protein sequences, i.e. the exponentially large set of all strings over the 20-letter amino acid alphabet, rather than just the set of sequences found in nature.In order to make our approach computationally feasible, we develop randomized algorithms that perform approximate enumeration in sequence space with provable performance guarantees. We draw on the area of rapidly mixing Markov chains, by exhibiting a connection between the evolutionary capacity of proteins and the number of feasible solutions to the Knapsack problem. This connection allows us to design an algorithm for approximating the evolutionary capacity, extending a recent result of Morris and Sinclair on the Knapsack problem. We present computational experiments that show the method to be effective in practice on large collections of protein structures. In addition, we show how to use approximations to the evolutionary capacity to compute a statistical mechanics notion of \"evolutionary temperature\" on sequence space.","PeriodicalId":169149,"journal":{"name":"Proceedings of the eighth annual international conference on Research in computational molecular biology","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the eighth annual international conference on Research in computational molecular biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/974614.974653","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

In nature, one finds large collections of different protein sequences exhibiting roughly the same three-dimensional structure, and this observation underpins the study of structural protein families. In studying such families at a global level, a natural question to ask is how close to "optimal" the native sequences are in terms of their energy. We therefore define and compute the evolutionary capacity of a protein structure as the total number of sequences whose energy in the structure is below that of the native sequence. An important aspect of our definition is that we consider the space of all possible protein sequences, i.e. the exponentially large set of all strings over the 20-letter amino acid alphabet, rather than just the set of sequences found in nature.In order to make our approach computationally feasible, we develop randomized algorithms that perform approximate enumeration in sequence space with provable performance guarantees. We draw on the area of rapidly mixing Markov chains, by exhibiting a connection between the evolutionary capacity of proteins and the number of feasible solutions to the Knapsack problem. This connection allows us to design an algorithm for approximating the evolutionary capacity, extending a recent result of Morris and Sinclair on the Knapsack problem. We present computational experiments that show the method to be effective in practice on large collections of protein structures. In addition, we show how to use approximations to the evolutionary capacity to compute a statistical mechanics notion of "evolutionary temperature" on sequence space.

查看原文本刊更多论文

蛋白质结构的进化能力

在自然界中，人们发现大量不同的蛋白质序列表现出大致相同的三维结构，这一观察结果支撑了结构蛋白家族的研究。在全球范围内研究这类家族时，一个自然的问题是，就其能量而言，本地序列离“最优”有多近。因此，我们将蛋白质结构的进化能力定义为结构中能量低于天然序列的序列总数。我们定义的一个重要方面是，我们考虑了所有可能的蛋白质序列的空间，即20个字母的氨基酸字母表中所有字符串的指数大集合，而不仅仅是自然界中发现的序列集。为了使我们的方法在计算上可行，我们开发了随机算法，在序列空间中执行近似枚举，并具有可证明的性能保证。我们利用快速混合马尔可夫链的区域，通过展示蛋白质的进化能力和背包问题可行解决方案的数量之间的联系。这种联系使我们能够设计一种近似进化能力的算法，扩展莫里斯和辛克莱最近在背包问题上的结果。我们提出的计算实验表明，该方法是有效的，在实践中对蛋白质结构的大集合。此外，我们展示了如何使用演化能力的近似来计算序列空间上的“演化温度”的统计力学概念。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the eighth annual international conference on Research in computational molecular biology

自引率

0.00%

发文量