{"title":"一种鲁棒生物序列聚类方法","authors":"Wei-bang Chen, Chengcui Zhang","doi":"10.1109/IRI.2006.252427","DOIUrl":null,"url":null,"abstract":"In this paper, we proposed a two-phase hybrid method for biological sequence clustering, which combines the strengths of the hierarchical agglomerative clustering methods and the partition clustering methods. In phase I, the hybrid method uses the hierarchical agglomerative clustering algorithm to pre-cluster the aligned sequences, while in the second phase it takes the pre-clustering result as the initial partition for the profile hidden Markov models (HMMs) based k-means partition clustering method. Such initial partitions (generated from phase I), as against random initial partitions, are usually more reasonable and thus can avoid the inconsistency problem in the partition clustering methods due to the randomness in initial partitions. In addition, the inaccuracy of the hierarchical agglomerative clustering methods can be compensated by the profile HMM based k-means partition clustering since the latter is model-based and can better describe the dynamic properties of the data in a cluster. Experiments on a molecular sequence dataset demonstrate the effectiveness and the efficiency of the proposed hybrid clustering algorithm","PeriodicalId":402255,"journal":{"name":"2006 IEEE International Conference on Information Reuse & Integration","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Robust Method for Biological Sequence Clustering\",\"authors\":\"Wei-bang Chen, Chengcui Zhang\",\"doi\":\"10.1109/IRI.2006.252427\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we proposed a two-phase hybrid method for biological sequence clustering, which combines the strengths of the hierarchical agglomerative clustering methods and the partition clustering methods. In phase I, the hybrid method uses the hierarchical agglomerative clustering algorithm to pre-cluster the aligned sequences, while in the second phase it takes the pre-clustering result as the initial partition for the profile hidden Markov models (HMMs) based k-means partition clustering method. Such initial partitions (generated from phase I), as against random initial partitions, are usually more reasonable and thus can avoid the inconsistency problem in the partition clustering methods due to the randomness in initial partitions. In addition, the inaccuracy of the hierarchical agglomerative clustering methods can be compensated by the profile HMM based k-means partition clustering since the latter is model-based and can better describe the dynamic properties of the data in a cluster. Experiments on a molecular sequence dataset demonstrate the effectiveness and the efficiency of the proposed hybrid clustering algorithm\",\"PeriodicalId\":402255,\"journal\":{\"name\":\"2006 IEEE International Conference on Information Reuse & Integration\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2006 IEEE International Conference on Information Reuse & Integration\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IRI.2006.252427\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 IEEE International Conference on Information Reuse & Integration","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2006.252427","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Robust Method for Biological Sequence Clustering
In this paper, we proposed a two-phase hybrid method for biological sequence clustering, which combines the strengths of the hierarchical agglomerative clustering methods and the partition clustering methods. In phase I, the hybrid method uses the hierarchical agglomerative clustering algorithm to pre-cluster the aligned sequences, while in the second phase it takes the pre-clustering result as the initial partition for the profile hidden Markov models (HMMs) based k-means partition clustering method. Such initial partitions (generated from phase I), as against random initial partitions, are usually more reasonable and thus can avoid the inconsistency problem in the partition clustering methods due to the randomness in initial partitions. In addition, the inaccuracy of the hierarchical agglomerative clustering methods can be compensated by the profile HMM based k-means partition clustering since the latter is model-based and can better describe the dynamic properties of the data in a cluster. Experiments on a molecular sequence dataset demonstrate the effectiveness and the efficiency of the proposed hybrid clustering algorithm