{"title":"Prediction of Long-range Contacts from Sequence Profile","authors":"Peng Chen, Bing Wang, H. Wong, De-shuang Huang","doi":"10.1109/IJCNN.2007.4371084","DOIUrl":null,"url":null,"abstract":"Theoretic study in this paper shows that we can obtain exact long-range contacts by adopting one classifier if the centers of sequence profiles of residue pairs for long-range contacts and non-long-range contacts are known. The adopted classifier, referred to as multiple conditional probability mass function classifier (MCPMFC), can find an optimized transformation of the variables for each of the classes and therefore resulting in K separate classifiers. As a result, about 44.48% long-range contacts are around at the sequence profile (SP) centre for long-range contacts and about 20.9% long-range contacts are correctly predicted when considering the top L/5 (L is the protein sequence length) predicted contacts and the residue pair with 24 apart. The highest cluster result gives us a clue that SP center should be a sound pathway to investigate contact map in protein structures.","PeriodicalId":350091,"journal":{"name":"2007 International Joint Conference on Neural Networks","volume":"193 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 International Joint Conference on Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.2007.4371084","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Theoretic study in this paper shows that we can obtain exact long-range contacts by adopting one classifier if the centers of sequence profiles of residue pairs for long-range contacts and non-long-range contacts are known. The adopted classifier, referred to as multiple conditional probability mass function classifier (MCPMFC), can find an optimized transformation of the variables for each of the classes and therefore resulting in K separate classifiers. As a result, about 44.48% long-range contacts are around at the sequence profile (SP) centre for long-range contacts and about 20.9% long-range contacts are correctly predicted when considering the top L/5 (L is the protein sequence length) predicted contacts and the residue pair with 24 apart. The highest cluster result gives us a clue that SP center should be a sound pathway to investigate contact map in protein structures.
本文的理论研究表明,如果远程接触和非远程接触残差对序列轮廓的中心已知,我们可以用一个分类器得到精确的远程接触。所采用的分类器称为多重条件概率质量函数分类器(multiple conditional probability mass function classifier, MCPMFC),它可以为每个类找到一个优化的变量变换,从而得到K个独立的分类器。结果表明,在序列轮廓(SP)中心附近有44.48%的远程接触点,考虑到预测接触点的最高L/5 (L为蛋白质序列长度)和距离为24的残基对,正确预测的远程接触点约为20.9%。最高的聚类结果提示SP中心可能是研究蛋白质结构中接触图谱的良好途径。