{"title":"Optimal protein structure extraction with clustering algorithm","authors":"Jiafu Zhao, Xiaolong Zhang, Min Cao Hubei","doi":"10.1109/ICIEA.2018.8397889","DOIUrl":null,"url":null,"abstract":"In this paper, we use K-means++ and AP algorithm to cluster the five protein similarity measures of RMSD, TM, MaxSub, GDT-TS and GDT-HA. As for the selection of the number of clusters, using the measures of Scikit-learn to value the cluster result to attain an optimal number of clusters. Also, we optimize the AP algorithm through the change of the preference value, whose value changes from the general mean value, max value, min value to mean value of 4 neighbour at the corresponding position. Moreover, we propose a cluster center selection algorithm based on mean distance from data points to cluster center and number of data points in a cluster, which could automatically delete exception value, thus improve the accuracy of selection cluster center. After the cluster and selection of cluster center, we get better similarity between protein structure and natural protein compare to the earlier results, which means a lot to the protein structure prediction.","PeriodicalId":140420,"journal":{"name":"2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIEA.2018.8397889","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we use K-means++ and AP algorithm to cluster the five protein similarity measures of RMSD, TM, MaxSub, GDT-TS and GDT-HA. As for the selection of the number of clusters, using the measures of Scikit-learn to value the cluster result to attain an optimal number of clusters. Also, we optimize the AP algorithm through the change of the preference value, whose value changes from the general mean value, max value, min value to mean value of 4 neighbour at the corresponding position. Moreover, we propose a cluster center selection algorithm based on mean distance from data points to cluster center and number of data points in a cluster, which could automatically delete exception value, thus improve the accuracy of selection cluster center. After the cluster and selection of cluster center, we get better similarity between protein structure and natural protein compare to the earlier results, which means a lot to the protein structure prediction.