Vinicius Carius de Souza, L. Goliatt, P. V. Z. C. Goliatt
{"title":"Clustering algorithms applied on analysis of protein molecular dynamics","authors":"Vinicius Carius de Souza, L. Goliatt, P. V. Z. C. Goliatt","doi":"10.1109/LA-CCI.2017.8285695","DOIUrl":null,"url":null,"abstract":"Analysis of molecular dynamic (MD) simulation has been difficult since this method generates a lot of conformations. Thus clustering algorithms have been applied to group similar structures from MD simulations, but the choice of the information to be clustered is still a challenge. In this work, we propose the use of Euclidean distance matrices (EDM) from conformations as input data to clustering algorithms. We used approaches combining non-reduction or reduction of data dimensionality (MDS and isomap methods), and different clustering algorithms (k-means, ward, mean-shift and affinity propagation). Results indicated that EDM could be a good information to be used in clustering conformations from MD. For data with small protein structure variation, the mean-shift algorithm had good results in both non-reduced and reduced data. However, for data with large protein structure variation, the methods that work better with smooth-density data (k-means and ward) had good results.","PeriodicalId":144567,"journal":{"name":"2017 IEEE Latin American Conference on Computational Intelligence (LA-CCI)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Latin American Conference on Computational Intelligence (LA-CCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/LA-CCI.2017.8285695","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Analysis of molecular dynamic (MD) simulation has been difficult since this method generates a lot of conformations. Thus clustering algorithms have been applied to group similar structures from MD simulations, but the choice of the information to be clustered is still a challenge. In this work, we propose the use of Euclidean distance matrices (EDM) from conformations as input data to clustering algorithms. We used approaches combining non-reduction or reduction of data dimensionality (MDS and isomap methods), and different clustering algorithms (k-means, ward, mean-shift and affinity propagation). Results indicated that EDM could be a good information to be used in clustering conformations from MD. For data with small protein structure variation, the mean-shift algorithm had good results in both non-reduced and reduced data. However, for data with large protein structure variation, the methods that work better with smooth-density data (k-means and ward) had good results.