A polynomial-time algorithm for a class of protein threading problems.

Computer applications in the biosciences : CABIOS Pub Date : 1996-12-01 DOI:10.1093/bioinformatics/12.6.511

Y Xu, E C Uberbacher

{"title":"A polynomial-time algorithm for a class of protein threading problems.","authors":"Y Xu, E C Uberbacher","doi":"10.1093/bioinformatics/12.6.511","DOIUrl":null,"url":null,"abstract":"<p><p>This paper presents an algorithm for constructing an optimal alignment between a three-dimensional protein structure template and an amino acid sequence. A protein structure template is given as a sequence of amino acid residue positions in three-dimensional space, along with an array of physical properties attached to each position; these residue positions are sequentially grouped into a series of core secondary structures (central helices and beta sheets). In addition to match scores and gap penalties, as in a traditional sequence-sequence alignment problem, the quality of a structure-sequence alignment is also determined by interaction preferences among amino acids aligned with structure positions that are spatially close (we call these 'long-range interactions'). Although it is known that constructing such a structure-sequence alignment in the most general form is NP-hard, our algorithm runs in polynomial time when restricted to structures with a 'modest' number of long-range amino acid interactions. In the current work, long-range interactions are limited to interactions between amino acids from different core secondary structures. Dividing the series of core secondary structures into two subseries creates a cut set of long-range interactions. If we use N, M and C to represent the size of an amino acid sequence, the size of a structure template, and the maximum cut size of long-range interactions, respectively, the algorithm finds an optimal structure-sequence alignment in O(21C NM) time, a polynomial function of N and M when C = O(log(N + M)). When running on structure-sequence alignment problems without long-range intersections, i.e. C = 0, the algorithm achieves the same asymptotic computational complexity of the Smith-Waterman sequence-sequence alignment algorithm.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":"12 6","pages":"511-7"},"PeriodicalIF":0.0000,"publicationDate":"1996-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/12.6.511","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer applications in the biosciences : CABIOS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/12.6.511","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

Abstract

This paper presents an algorithm for constructing an optimal alignment between a three-dimensional protein structure template and an amino acid sequence. A protein structure template is given as a sequence of amino acid residue positions in three-dimensional space, along with an array of physical properties attached to each position; these residue positions are sequentially grouped into a series of core secondary structures (central helices and beta sheets). In addition to match scores and gap penalties, as in a traditional sequence-sequence alignment problem, the quality of a structure-sequence alignment is also determined by interaction preferences among amino acids aligned with structure positions that are spatially close (we call these 'long-range interactions'). Although it is known that constructing such a structure-sequence alignment in the most general form is NP-hard, our algorithm runs in polynomial time when restricted to structures with a 'modest' number of long-range amino acid interactions. In the current work, long-range interactions are limited to interactions between amino acids from different core secondary structures. Dividing the series of core secondary structures into two subseries creates a cut set of long-range interactions. If we use N, M and C to represent the size of an amino acid sequence, the size of a structure template, and the maximum cut size of long-range interactions, respectively, the algorithm finds an optimal structure-sequence alignment in O(21C NM) time, a polynomial function of N and M when C = O(log(N + M)). When running on structure-sequence alignment problems without long-range intersections, i.e. C = 0, the algorithm achieves the same asymptotic computational complexity of the Smith-Waterman sequence-sequence alignment algorithm.

查看原文本刊更多论文

一类蛋白质穿线问题的多项式时间算法。

本文提出了一种构建三维蛋白质结构模板与氨基酸序列最优比对的算法。给出蛋白质结构模板作为三维空间中氨基酸残基位置的序列，以及附加到每个位置的一系列物理性质;这些残基位置依次组合成一系列核心二级结构(中心螺旋和β片)。除了匹配分数和间隙惩罚，就像在传统的序列-序列比对问题中一样，结构-序列比对的质量还取决于与空间上接近的结构位置对齐的氨基酸之间的相互作用偏好(我们称之为“远程相互作用”)。虽然众所周知，在最一般的形式下构建这样的结构-序列比对是np困难的，但当限于具有“适度”数量的远程氨基酸相互作用的结构时，我们的算法在多项式时间内运行。在目前的工作中，远程相互作用仅限于来自不同核心二级结构的氨基酸之间的相互作用。将核心次级结构系列划分为两个子系列可以创建一组远程相互作用。如果我们用N、M和C分别表示氨基酸序列的大小、结构模板的大小和远程相互作用的最大切割大小，该算法在O(21C NM)时间内找到最优的结构序列比对，当C = O(log(N + M))时，该算法是N和M的多项式函数。当算法运行在无长距离交集的结构-序列比对问题上，即C = 0时，算法的计算复杂度与Smith-Waterman序列-序列比对算法相同。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer applications in the biosciences : CABIOS

自引率

0.00%

发文量