Optimal protein structure extraction with clustering algorithm

Jiafu Zhao, Xiaolong Zhang, Min Cao Hubei
{"title":"Optimal protein structure extraction with clustering algorithm","authors":"Jiafu Zhao, Xiaolong Zhang, Min Cao Hubei","doi":"10.1109/ICIEA.2018.8397889","DOIUrl":null,"url":null,"abstract":"In this paper, we use K-means++ and AP algorithm to cluster the five protein similarity measures of RMSD, TM, MaxSub, GDT-TS and GDT-HA. As for the selection of the number of clusters, using the measures of Scikit-learn to value the cluster result to attain an optimal number of clusters. Also, we optimize the AP algorithm through the change of the preference value, whose value changes from the general mean value, max value, min value to mean value of 4 neighbour at the corresponding position. Moreover, we propose a cluster center selection algorithm based on mean distance from data points to cluster center and number of data points in a cluster, which could automatically delete exception value, thus improve the accuracy of selection cluster center. After the cluster and selection of cluster center, we get better similarity between protein structure and natural protein compare to the earlier results, which means a lot to the protein structure prediction.","PeriodicalId":140420,"journal":{"name":"2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIEA.2018.8397889","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In this paper, we use K-means++ and AP algorithm to cluster the five protein similarity measures of RMSD, TM, MaxSub, GDT-TS and GDT-HA. As for the selection of the number of clusters, using the measures of Scikit-learn to value the cluster result to attain an optimal number of clusters. Also, we optimize the AP algorithm through the change of the preference value, whose value changes from the general mean value, max value, min value to mean value of 4 neighbour at the corresponding position. Moreover, we propose a cluster center selection algorithm based on mean distance from data points to cluster center and number of data points in a cluster, which could automatically delete exception value, thus improve the accuracy of selection cluster center. After the cluster and selection of cluster center, we get better similarity between protein structure and natural protein compare to the earlier results, which means a lot to the protein structure prediction.
基于聚类算法的最优蛋白质结构提取
本文采用k -means++和AP算法对RMSD、TM、MaxSub、GDT-TS和GDT-HA 5个蛋白质相似度测度进行聚类。在聚类数量的选择上,利用Scikit-learn的度量对聚类结果进行估值,以获得最优的聚类数量。同时,我们通过改变偏好值来优化AP算法,其值由一般平均值、最大值、最小值变为对应位置上4个邻居的平均值。此外,我们提出了一种基于数据点到聚类中心的平均距离和聚类中数据点个数的聚类中心选择算法,该算法可以自动删除异常值,从而提高了聚类中心选择的准确性。经过聚类和聚类中心的选择,我们得到了蛋白质结构与天然蛋白质的相似性较早的结果,这对蛋白质结构的预测具有重要意义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信