{"title":"A multivariate clustering of AAindex database for protein numerical representation","authors":"M. Forghani, Rouhollah Khani","doi":"10.1109/ICSPIS.2017.8311579","DOIUrl":null,"url":null,"abstract":"As a first step of genomics signal processing, alphabetical sequence is mapped to numerical. The choice of mapping techniques depends on the application and affects the result of the study. Since biological function is the result of amino acids interactions, a significant method for alphabetical to numerical conversion of sequence is to use the physico-chemical and biochemical properties of amino acids. AAindex database is a rich collection of such properties that can be used for numerical representation of protein. Each of these properties gives a viewpoint in the study of biological functions. Taking into account all AAindex indices leads to a multi-viewpoint representation and provides more options to observe and study the target biological phenomena. But this advantage increases variables number, space dimension and computation time. Since there is correlation between AAindex databases, to handle the issue of space dimension increasement, compact versions of correlated indices are extracted. This paper aims at the construction of new indices through clustering of AAindex database with correlation distance. The results suggest that due to the correlation of these new maps with groups of AAindex indices (in clusters); they have the potential to be used for numerical representation of protein sequence in different studies.","PeriodicalId":380266,"journal":{"name":"2017 3rd Iranian Conference on Intelligent Systems and Signal Processing (ICSPIS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 3rd Iranian Conference on Intelligent Systems and Signal Processing (ICSPIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSPIS.2017.8311579","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
As a first step of genomics signal processing, alphabetical sequence is mapped to numerical. The choice of mapping techniques depends on the application and affects the result of the study. Since biological function is the result of amino acids interactions, a significant method for alphabetical to numerical conversion of sequence is to use the physico-chemical and biochemical properties of amino acids. AAindex database is a rich collection of such properties that can be used for numerical representation of protein. Each of these properties gives a viewpoint in the study of biological functions. Taking into account all AAindex indices leads to a multi-viewpoint representation and provides more options to observe and study the target biological phenomena. But this advantage increases variables number, space dimension and computation time. Since there is correlation between AAindex databases, to handle the issue of space dimension increasement, compact versions of correlated indices are extracted. This paper aims at the construction of new indices through clustering of AAindex database with correlation distance. The results suggest that due to the correlation of these new maps with groups of AAindex indices (in clusters); they have the potential to be used for numerical representation of protein sequence in different studies.