{"title":"Toward Protein Structure Analysis with Self-Organizing Maps","authors":"L. Hamel, Gongqin Sun, Jing Zhang","doi":"10.1109/CIBCB.2005.1594961","DOIUrl":null,"url":null,"abstract":"Establishing structure-function relationships on the proteomic scale is a unique challenge faced by bioinformatics and molecular biosciences. Large protein families represent natural libraries of analogues of a given catalytic or protein function, thus making them ideal targets for the investigation of structure-function relationships in proteins. To this end, we have developed a new technique for analyzing large amounts of detailed molecular structure information focusing on the functional centers of homologous proteins. Our approach uses unsupervised machine learning, in particular, self-organizing maps. The information captured by a self-organizing map and stored in its reference models highlights the essential structure of the proteins under investigation and can be effectively used to study detailed structural differences and similarities among homologous proteins. Our preliminary results obtained with a prototype based on these techniques demonstrate that we can classify proteins and identify common and unique structures within a family and, more importantly, identify common and unique structural features of different conformations of the same protein. The approach developed here outperforms many of today’s structure analysis tools. These tools are usually either limited by the number of proteins they can process at the same time or they are limited by the structural resolution they can accommodate, that is, many of the structural analysis tools that can handle multiple proteins at the same time limit themselves to secondary structure analysis and therefore miss fine structural nuances within proteins.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIBCB.2005.1594961","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Establishing structure-function relationships on the proteomic scale is a unique challenge faced by bioinformatics and molecular biosciences. Large protein families represent natural libraries of analogues of a given catalytic or protein function, thus making them ideal targets for the investigation of structure-function relationships in proteins. To this end, we have developed a new technique for analyzing large amounts of detailed molecular structure information focusing on the functional centers of homologous proteins. Our approach uses unsupervised machine learning, in particular, self-organizing maps. The information captured by a self-organizing map and stored in its reference models highlights the essential structure of the proteins under investigation and can be effectively used to study detailed structural differences and similarities among homologous proteins. Our preliminary results obtained with a prototype based on these techniques demonstrate that we can classify proteins and identify common and unique structures within a family and, more importantly, identify common and unique structural features of different conformations of the same protein. The approach developed here outperforms many of today’s structure analysis tools. These tools are usually either limited by the number of proteins they can process at the same time or they are limited by the structural resolution they can accommodate, that is, many of the structural analysis tools that can handle multiple proteins at the same time limit themselves to secondary structure analysis and therefore miss fine structural nuances within proteins.