The evolutionary capacity of protein structures

Leonid Meyerguz, D. Kempe, J. Kleinberg, R. Elber
{"title":"The evolutionary capacity of protein structures","authors":"Leonid Meyerguz, D. Kempe, J. Kleinberg, R. Elber","doi":"10.1145/974614.974653","DOIUrl":null,"url":null,"abstract":"In nature, one finds large collections of different protein sequences exhibiting roughly the same three-dimensional structure, and this observation underpins the study of structural protein families. In studying such families at a global level, a natural question to ask is how close to \"optimal\" the native sequences are in terms of their energy. We therefore define and compute the evolutionary capacity of a protein structure as the total number of sequences whose energy in the structure is below that of the native sequence. An important aspect of our definition is that we consider the space of all possible protein sequences, i.e. the exponentially large set of all strings over the 20-letter amino acid alphabet, rather than just the set of sequences found in nature.In order to make our approach computationally feasible, we develop randomized algorithms that perform approximate enumeration in sequence space with provable performance guarantees. We draw on the area of rapidly mixing Markov chains, by exhibiting a connection between the evolutionary capacity of proteins and the number of feasible solutions to the Knapsack problem. This connection allows us to design an algorithm for approximating the evolutionary capacity, extending a recent result of Morris and Sinclair on the Knapsack problem. We present computational experiments that show the method to be effective in practice on large collections of protein structures. In addition, we show how to use approximations to the evolutionary capacity to compute a statistical mechanics notion of \"evolutionary temperature\" on sequence space.","PeriodicalId":169149,"journal":{"name":"Proceedings of the eighth annual international conference on Research in computational molecular biology","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the eighth annual international conference on Research in computational molecular biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/974614.974653","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

In nature, one finds large collections of different protein sequences exhibiting roughly the same three-dimensional structure, and this observation underpins the study of structural protein families. In studying such families at a global level, a natural question to ask is how close to "optimal" the native sequences are in terms of their energy. We therefore define and compute the evolutionary capacity of a protein structure as the total number of sequences whose energy in the structure is below that of the native sequence. An important aspect of our definition is that we consider the space of all possible protein sequences, i.e. the exponentially large set of all strings over the 20-letter amino acid alphabet, rather than just the set of sequences found in nature.In order to make our approach computationally feasible, we develop randomized algorithms that perform approximate enumeration in sequence space with provable performance guarantees. We draw on the area of rapidly mixing Markov chains, by exhibiting a connection between the evolutionary capacity of proteins and the number of feasible solutions to the Knapsack problem. This connection allows us to design an algorithm for approximating the evolutionary capacity, extending a recent result of Morris and Sinclair on the Knapsack problem. We present computational experiments that show the method to be effective in practice on large collections of protein structures. In addition, we show how to use approximations to the evolutionary capacity to compute a statistical mechanics notion of "evolutionary temperature" on sequence space.
蛋白质结构的进化能力
在自然界中,人们发现大量不同的蛋白质序列表现出大致相同的三维结构,这一观察结果支撑了结构蛋白家族的研究。在全球范围内研究这类家族时,一个自然的问题是,就其能量而言,本地序列离“最优”有多近。因此,我们将蛋白质结构的进化能力定义为结构中能量低于天然序列的序列总数。我们定义的一个重要方面是,我们考虑了所有可能的蛋白质序列的空间,即20个字母的氨基酸字母表中所有字符串的指数大集合,而不仅仅是自然界中发现的序列集。为了使我们的方法在计算上可行,我们开发了随机算法,在序列空间中执行近似枚举,并具有可证明的性能保证。我们利用快速混合马尔可夫链的区域,通过展示蛋白质的进化能力和背包问题可行解决方案的数量之间的联系。这种联系使我们能够设计一种近似进化能力的算法,扩展莫里斯和辛克莱最近在背包问题上的结果。我们提出的计算实验表明,该方法是有效的,在实践中对蛋白质结构的大集合。此外,我们展示了如何使用演化能力的近似来计算序列空间上的“演化温度”的统计力学概念。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信