Vinicius Carius de Souza, L. Goliatt, P. V. Z. C. Goliatt
{"title":"聚类算法在蛋白质分子动力学分析中的应用","authors":"Vinicius Carius de Souza, L. Goliatt, P. V. Z. C. Goliatt","doi":"10.1109/LA-CCI.2017.8285695","DOIUrl":null,"url":null,"abstract":"Analysis of molecular dynamic (MD) simulation has been difficult since this method generates a lot of conformations. Thus clustering algorithms have been applied to group similar structures from MD simulations, but the choice of the information to be clustered is still a challenge. In this work, we propose the use of Euclidean distance matrices (EDM) from conformations as input data to clustering algorithms. We used approaches combining non-reduction or reduction of data dimensionality (MDS and isomap methods), and different clustering algorithms (k-means, ward, mean-shift and affinity propagation). Results indicated that EDM could be a good information to be used in clustering conformations from MD. For data with small protein structure variation, the mean-shift algorithm had good results in both non-reduced and reduced data. However, for data with large protein structure variation, the methods that work better with smooth-density data (k-means and ward) had good results.","PeriodicalId":144567,"journal":{"name":"2017 IEEE Latin American Conference on Computational Intelligence (LA-CCI)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Clustering algorithms applied on analysis of protein molecular dynamics\",\"authors\":\"Vinicius Carius de Souza, L. Goliatt, P. V. Z. C. Goliatt\",\"doi\":\"10.1109/LA-CCI.2017.8285695\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Analysis of molecular dynamic (MD) simulation has been difficult since this method generates a lot of conformations. Thus clustering algorithms have been applied to group similar structures from MD simulations, but the choice of the information to be clustered is still a challenge. In this work, we propose the use of Euclidean distance matrices (EDM) from conformations as input data to clustering algorithms. We used approaches combining non-reduction or reduction of data dimensionality (MDS and isomap methods), and different clustering algorithms (k-means, ward, mean-shift and affinity propagation). Results indicated that EDM could be a good information to be used in clustering conformations from MD. For data with small protein structure variation, the mean-shift algorithm had good results in both non-reduced and reduced data. However, for data with large protein structure variation, the methods that work better with smooth-density data (k-means and ward) had good results.\",\"PeriodicalId\":144567,\"journal\":{\"name\":\"2017 IEEE Latin American Conference on Computational Intelligence (LA-CCI)\",\"volume\":\"47 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE Latin American Conference on Computational Intelligence (LA-CCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/LA-CCI.2017.8285695\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Latin American Conference on Computational Intelligence (LA-CCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/LA-CCI.2017.8285695","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Clustering algorithms applied on analysis of protein molecular dynamics
Analysis of molecular dynamic (MD) simulation has been difficult since this method generates a lot of conformations. Thus clustering algorithms have been applied to group similar structures from MD simulations, but the choice of the information to be clustered is still a challenge. In this work, we propose the use of Euclidean distance matrices (EDM) from conformations as input data to clustering algorithms. We used approaches combining non-reduction or reduction of data dimensionality (MDS and isomap methods), and different clustering algorithms (k-means, ward, mean-shift and affinity propagation). Results indicated that EDM could be a good information to be used in clustering conformations from MD. For data with small protein structure variation, the mean-shift algorithm had good results in both non-reduced and reduced data. However, for data with large protein structure variation, the methods that work better with smooth-density data (k-means and ward) had good results.