Optimal protein structure extraction with clustering algorithm

2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA) Pub Date : 2018-05-01 DOI:10.1109/ICIEA.2018.8397889

Jiafu Zhao, Xiaolong Zhang, Min Cao Hubei

引用次数: 0

Abstract

In this paper, we use K-means++ and AP algorithm to cluster the five protein similarity measures of RMSD, TM, MaxSub, GDT-TS and GDT-HA. As for the selection of the number of clusters, using the measures of Scikit-learn to value the cluster result to attain an optimal number of clusters. Also, we optimize the AP algorithm through the change of the preference value, whose value changes from the general mean value, max value, min value to mean value of 4 neighbour at the corresponding position. Moreover, we propose a cluster center selection algorithm based on mean distance from data points to cluster center and number of data points in a cluster, which could automatically delete exception value, thus improve the accuracy of selection cluster center. After the cluster and selection of cluster center, we get better similarity between protein structure and natural protein compare to the earlier results, which means a lot to the protein structure prediction.

查看原文本刊更多论文

基于聚类算法的最优蛋白质结构提取

本文采用k -means++和AP算法对RMSD、TM、MaxSub、GDT-TS和GDT-HA 5个蛋白质相似度测度进行聚类。在聚类数量的选择上，利用Scikit-learn的度量对聚类结果进行估值，以获得最优的聚类数量。同时，我们通过改变偏好值来优化AP算法，其值由一般平均值、最大值、最小值变为对应位置上4个邻居的平均值。此外，我们提出了一种基于数据点到聚类中心的平均距离和聚类中数据点个数的聚类中心选择算法，该算法可以自动删除异常值，从而提高了聚类中心选择的准确性。经过聚类和聚类中心的选择，我们得到了蛋白质结构与天然蛋白质的相似性较早的结果，这对蛋白质结构的预测具有重要意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA)

自引率

0.00%

发文量