{"title":"DDPIn - Distance and density based protein indexing","authors":"D. Hoksza","doi":"10.1109/CIBCB.2009.4925737","DOIUrl":null,"url":null,"abstract":"Protein structure similarity and classification methods have many applications in protein function prediction and associated fields (e.g. drug discovery). In this paper, we propose a new protein structure representation method enabling fast and accurate classification. In our approach, each protein structure is represented by number of vectors (based on histogram of distances) equivalent to the number of its Cα residues. Each Cα residue represents a viewpoint from which the distances to each of the other residues are computed. Consequently, we use several methods to convert these distances into a n-dimensional feature vector which is indexed using a metric indexing structure (M-tree is the structure of our choice). While searching, we use single or multi-step approach which provides us with classification accuracy and speed comparable to the best contemporary classification methods.","PeriodicalId":162052,"journal":{"name":"2009 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIBCB.2009.4925737","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Protein structure similarity and classification methods have many applications in protein function prediction and associated fields (e.g. drug discovery). In this paper, we propose a new protein structure representation method enabling fast and accurate classification. In our approach, each protein structure is represented by number of vectors (based on histogram of distances) equivalent to the number of its Cα residues. Each Cα residue represents a viewpoint from which the distances to each of the other residues are computed. Consequently, we use several methods to convert these distances into a n-dimensional feature vector which is indexed using a metric indexing structure (M-tree is the structure of our choice). While searching, we use single or multi-step approach which provides us with classification accuracy and speed comparable to the best contemporary classification methods.