一种鲁棒生物序列聚类方法

2006 IEEE International Conference on Information Reuse & Integration Pub Date : 2006-12-04 DOI:10.1109/IRI.2006.252427

Wei-bang Chen, Chengcui Zhang

{"title":"一种鲁棒生物序列聚类方法","authors":"Wei-bang Chen, Chengcui Zhang","doi":"10.1109/IRI.2006.252427","DOIUrl":null,"url":null,"abstract":"In this paper, we proposed a two-phase hybrid method for biological sequence clustering, which combines the strengths of the hierarchical agglomerative clustering methods and the partition clustering methods. In phase I, the hybrid method uses the hierarchical agglomerative clustering algorithm to pre-cluster the aligned sequences, while in the second phase it takes the pre-clustering result as the initial partition for the profile hidden Markov models (HMMs) based k-means partition clustering method. Such initial partitions (generated from phase I), as against random initial partitions, are usually more reasonable and thus can avoid the inconsistency problem in the partition clustering methods due to the randomness in initial partitions. In addition, the inaccuracy of the hierarchical agglomerative clustering methods can be compensated by the profile HMM based k-means partition clustering since the latter is model-based and can better describe the dynamic properties of the data in a cluster. Experiments on a molecular sequence dataset demonstrate the effectiveness and the efficiency of the proposed hybrid clustering algorithm","PeriodicalId":402255,"journal":{"name":"2006 IEEE International Conference on Information Reuse & Integration","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Robust Method for Biological Sequence Clustering\",\"authors\":\"Wei-bang Chen, Chengcui Zhang\",\"doi\":\"10.1109/IRI.2006.252427\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we proposed a two-phase hybrid method for biological sequence clustering, which combines the strengths of the hierarchical agglomerative clustering methods and the partition clustering methods. In phase I, the hybrid method uses the hierarchical agglomerative clustering algorithm to pre-cluster the aligned sequences, while in the second phase it takes the pre-clustering result as the initial partition for the profile hidden Markov models (HMMs) based k-means partition clustering method. Such initial partitions (generated from phase I), as against random initial partitions, are usually more reasonable and thus can avoid the inconsistency problem in the partition clustering methods due to the randomness in initial partitions. In addition, the inaccuracy of the hierarchical agglomerative clustering methods can be compensated by the profile HMM based k-means partition clustering since the latter is model-based and can better describe the dynamic properties of the data in a cluster. Experiments on a molecular sequence dataset demonstrate the effectiveness and the efficiency of the proposed hybrid clustering algorithm\",\"PeriodicalId\":402255,\"journal\":{\"name\":\"2006 IEEE International Conference on Information Reuse & Integration\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2006 IEEE International Conference on Information Reuse & Integration\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IRI.2006.252427\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 IEEE International Conference on Information Reuse & Integration","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2006.252427","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

本文提出了一种两相混合的生物序列聚类方法，该方法结合了层次聚类方法和划分聚类方法的优点。混合方法在第一阶段使用分层聚类算法对对齐序列进行预聚类，第二阶段将预聚类结果作为初始划分，用于基于k-means的轮廓隐马尔可夫模型(hmm)划分聚类方法。与随机初始分区相比，这种初始分区(从阶段1生成)通常更合理，因此可以避免由于初始分区的随机性而导致分区聚类方法中的不一致问题。此外，基于轮廓HMM的k-means划分聚类可以弥补分层聚类方法的不准确性，因为后者是基于模型的，可以更好地描述聚类中数据的动态特性。在分子序列数据集上的实验验证了该混合聚类算法的有效性和效率

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Robust Method for Biological Sequence Clustering

In this paper, we proposed a two-phase hybrid method for biological sequence clustering, which combines the strengths of the hierarchical agglomerative clustering methods and the partition clustering methods. In phase I, the hybrid method uses the hierarchical agglomerative clustering algorithm to pre-cluster the aligned sequences, while in the second phase it takes the pre-clustering result as the initial partition for the profile hidden Markov models (HMMs) based k-means partition clustering method. Such initial partitions (generated from phase I), as against random initial partitions, are usually more reasonable and thus can avoid the inconsistency problem in the partition clustering methods due to the randomness in initial partitions. In addition, the inaccuracy of the hierarchical agglomerative clustering methods can be compensated by the profile HMM based k-means partition clustering since the latter is model-based and can better describe the dynamic properties of the data in a cluster. Experiments on a molecular sequence dataset demonstrate the effectiveness and the efficiency of the proposed hybrid clustering algorithm

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2006 IEEE International Conference on Information Reuse & Integration

自引率

0.00%

发文量