High-throughput 3D structural homology detection via NMR resonance assignment.

Proceedings. IEEE Computational Systems Bioinformatics Conference Pub Date : 2004-01-01

Christopher James Langmead, Bruce Randall Donald

{"title":"High-throughput 3D structural homology detection via NMR resonance assignment.","authors":"Christopher James Langmead, Bruce Randall Donald","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>One goal of the structural genomics initiative is the identification of new protein folds. Sequence-based structural homology prediction methods are an important means for prioritizing unknown proteins for structure determination. However, an important challenge remains: two highly dissimilar sequences can have similar folds & how can we detect this rapidly, in the context of structural genomics? High-throughput NMR experiments, coupled with novel algorithms for data analysis, can address this challenge. We report an automated procedure, called HD, for detecting 3D structural homologies from sparse, unassigned protein NMR data. Our method identifies 3D models in a protein structural database whose geometries best fit the unassigned experimental NMR data. HD does not use, and is thus not limited by sequence homology. The method can also be used to confirm or refute structural predictions made by other techniques such as protein threading or homology modelling. The algorithm runs in O(pn + pn(5/2) log (cn)+p log p) time, where p is the number of proteins in the database, n is the number of residues in the target protein and c is the maximum edge weight in an integer-weighted bipartite graph. Our experiments on real NMR data from 3 different proteins against a database of 4,500 representative folds demonstrate that the method identifies closely related protein folds, including sub-domains of larger proteins, with as little as 10-30% sequence homology between the target protein (or sub-domain) and the computed model. In particular, we report no false-negatives or false-positives despite significant percentages of missing experimental data.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"278-89"},"PeriodicalIF":0.0000,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE Computational Systems Bioinformatics Conference","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

One goal of the structural genomics initiative is the identification of new protein folds. Sequence-based structural homology prediction methods are an important means for prioritizing unknown proteins for structure determination. However, an important challenge remains: two highly dissimilar sequences can have similar folds & how can we detect this rapidly, in the context of structural genomics? High-throughput NMR experiments, coupled with novel algorithms for data analysis, can address this challenge. We report an automated procedure, called HD, for detecting 3D structural homologies from sparse, unassigned protein NMR data. Our method identifies 3D models in a protein structural database whose geometries best fit the unassigned experimental NMR data. HD does not use, and is thus not limited by sequence homology. The method can also be used to confirm or refute structural predictions made by other techniques such as protein threading or homology modelling. The algorithm runs in O(pn + pn(5/2) log (cn)+p log p) time, where p is the number of proteins in the database, n is the number of residues in the target protein and c is the maximum edge weight in an integer-weighted bipartite graph. Our experiments on real NMR data from 3 different proteins against a database of 4,500 representative folds demonstrate that the method identifies closely related protein folds, including sub-domains of larger proteins, with as little as 10-30% sequence homology between the target protein (or sub-domain) and the computed model. In particular, we report no false-negatives or false-positives despite significant percentages of missing experimental data.

本刊更多论文

通过核磁共振分配的高通量三维结构同源性检测。

结构基因组学倡议的一个目标是鉴定新的蛋白质折叠。基于序列的结构同源性预测方法是确定未知蛋白结构优先级的重要手段。然而，一个重要的挑战仍然存在:两个高度不同的序列可以有相似的折叠&我们如何在结构基因组学的背景下快速检测到这一点?高通量核磁共振实验，加上新的数据分析算法，可以解决这一挑战。我们报告了一种称为HD的自动化程序，用于从稀疏的未分配蛋白质NMR数据中检测3D结构同源性。我们的方法在蛋白质结构数据库中识别三维模型，其几何形状最适合未分配的实验核磁共振数据。HD不使用，因此不受序列同源性的限制。该方法还可用于证实或反驳其他技术(如蛋白质穿线或同源性建模)所做的结构预测。算法运行时间为O(pn + pn(5/2) log (cn)+p log p)，其中p为数据库中蛋白质的个数，n为目标蛋白质的残基数，c为整数加权二部图的最大边权值。我们对3种不同蛋白质的真实NMR数据和4,500个代表性折叠的数据库进行了实验，结果表明该方法可以识别出密切相关的蛋白质折叠，包括较大蛋白质的子结构域，目标蛋白质(或子结构域)与计算模型之间的序列同源性只有10-30%。特别是，我们没有报告假阴性或假阳性，尽管有很大比例的实验数据缺失。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. IEEE Computational Systems Bioinformatics Conference

自引率

0.00%

发文量