Predicting Ranked SCOP Domains by Mining Associations of Visual Contents in Distance Matrices

Proceedings of the ... Asia-Pacific bioinformatics conference Pub Date : 2005-12-01 DOI:10.1142/9781860947292_0008

Pin-Hao Chi, C. Shyu

{"title":"Predicting Ranked SCOP Domains by Mining Associations of Visual Contents in Distance Matrices","authors":"Pin-Hao Chi, C. Shyu","doi":"10.1142/9781860947292_0008","DOIUrl":null,"url":null,"abstract":"Protein tertiary structures are known to have significant correlations with their biological functions. To understand the information of the protein structures, Structural Classification of Protein (SCOP) Database, which is manually constructed by human experts, classifies similar protein folds in the same domain hierarchy. Even though this approach is believed to be more reliable than applying traditional alignment methods in structural classifications, it is labor intensive. In this paper, we build a non-parametric classifier to predict possible SCOP domains for unknown protein structures. With supervised learning, the algorithm first maps tertiary structures of training proteins into two-dimensional distance matrices, and then extracts signatures from visual contents of matrices. A knowledge discovery and data mining (KDD) process further discovers relevant patterns in training signatures of each SCOP domain by mining association rules. Finally, the quantity of rules whose patterns match signatures of unknown proteins determines predicted domains in a ranked order. We select 7,702 protein chains from 150 domains of SCOP database 1.67 release as labelled data using 10 fold cross validation. Experimental results show that the prediction accuracy is 91.27% for the top ranked domain and 99.22% for the top 5 ranked domains. The average response time takes 6.34 seconds, exhibiting reasonably high prediction accuracy and efficiency.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"23 1","pages":"49-58"},"PeriodicalIF":0.0000,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... Asia-Pacific bioinformatics conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/9781860947292_0008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Protein tertiary structures are known to have significant correlations with their biological functions. To understand the information of the protein structures, Structural Classification of Protein (SCOP) Database, which is manually constructed by human experts, classifies similar protein folds in the same domain hierarchy. Even though this approach is believed to be more reliable than applying traditional alignment methods in structural classifications, it is labor intensive. In this paper, we build a non-parametric classifier to predict possible SCOP domains for unknown protein structures. With supervised learning, the algorithm first maps tertiary structures of training proteins into two-dimensional distance matrices, and then extracts signatures from visual contents of matrices. A knowledge discovery and data mining (KDD) process further discovers relevant patterns in training signatures of each SCOP domain by mining association rules. Finally, the quantity of rules whose patterns match signatures of unknown proteins determines predicted domains in a ranked order. We select 7,702 protein chains from 150 domains of SCOP database 1.67 release as labelled data using 10 fold cross validation. Experimental results show that the prediction accuracy is 91.27% for the top ranked domain and 99.22% for the top 5 ranked domains. The average response time takes 6.34 seconds, exhibiting reasonably high prediction accuracy and efficiency.

查看原文本刊更多论文

通过挖掘距离矩阵中视觉内容的关联预测SCOP排序域

已知蛋白质三级结构与其生物学功能有显著的相关性。为了了解蛋白质的结构信息，由人类专家手工构建的蛋白质结构分类数据库(SCOP)将相似的蛋白质折叠在同一域层次中进行分类。尽管这种方法被认为比在结构分类中应用传统的对齐方法更可靠，但它是劳动密集型的。在本文中，我们建立了一个非参数分类器来预测未知蛋白质结构可能的SCOP结构域。通过监督学习，该算法首先将训练蛋白的三级结构映射到二维距离矩阵中，然后从矩阵的视觉内容中提取特征。知识发现和数据挖掘(KDD)过程通过挖掘关联规则进一步发现每个SCOP域训练签名中的相关模式。最后，其模式与未知蛋白质的特征相匹配的规则的数量决定了预测结构域的排序顺序。我们从SCOP数据库1.67版本的150个结构域中选择7702个蛋白链作为标记数据，使用10倍交叉验证。实验结果表明，对排名前1位的域的预测准确率为91.27%，对排名前5位的域的预测准确率为99.22%。平均响应时间为6.34秒，具有较高的预测精度和效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ... Asia-Pacific bioinformatics conference

自引率

0.00%

发文量