{"title":"RBT-Km:多序列比对的K-Means聚类","authors":"J. Taheri, Albert Y. Zomaya","doi":"10.1109/AICCSA.2010.5586934","DOIUrl":null,"url":null,"abstract":"This paper presents a novel approach for solving the Multiple Sequence Alignment (MSA) problem. K-Means clustering is combined with the Rubber Band Technique (RBT) to introduce an iterative optimization algorithm, namely RBT-Km, to find the optimal alignment for a set of input protein sequences. In this technique, the MSA problem is modeled as a Rubber Band, while the solution space is modeled as plate with several poles corresponding locations in the input sequences that are most likely to be correlated and/or biologically related. K-Means clustering is then used to discriminate biologically related locations from those that may appear by chance. RBT-Km is tested with one of the well-known benchmarks in this field (BALiBASE 2.0). The results demonstrate the superiority of the proposed technique even in the case of formidable sequences.","PeriodicalId":352946,"journal":{"name":"ACS/IEEE International Conference on Computer Systems and Applications - AICCSA 2010","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"RBT-Km: K-Means clustering for Multiple Sequence Alignment\",\"authors\":\"J. Taheri, Albert Y. Zomaya\",\"doi\":\"10.1109/AICCSA.2010.5586934\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a novel approach for solving the Multiple Sequence Alignment (MSA) problem. K-Means clustering is combined with the Rubber Band Technique (RBT) to introduce an iterative optimization algorithm, namely RBT-Km, to find the optimal alignment for a set of input protein sequences. In this technique, the MSA problem is modeled as a Rubber Band, while the solution space is modeled as plate with several poles corresponding locations in the input sequences that are most likely to be correlated and/or biologically related. K-Means clustering is then used to discriminate biologically related locations from those that may appear by chance. RBT-Km is tested with one of the well-known benchmarks in this field (BALiBASE 2.0). The results demonstrate the superiority of the proposed technique even in the case of formidable sequences.\",\"PeriodicalId\":352946,\"journal\":{\"name\":\"ACS/IEEE International Conference on Computer Systems and Applications - AICCSA 2010\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS/IEEE International Conference on Computer Systems and Applications - AICCSA 2010\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AICCSA.2010.5586934\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS/IEEE International Conference on Computer Systems and Applications - AICCSA 2010","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICCSA.2010.5586934","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
摘要
提出了一种解决多序列比对(MSA)问题的新方法。将K-Means聚类与橡胶带技术(Rubber Band Technique, RBT)相结合,引入了一种迭代优化算法RBT- km,用于寻找一组输入蛋白质序列的最优对齐。在这种技术中,MSA问题被建模为橡皮筋,而解空间被建模为具有输入序列中最有可能相关和/或生物相关的几个极点对应位置的板。然后使用K-Means聚类来区分生物学上相关的位置和那些可能偶然出现的位置。RBT-Km使用该领域的知名基准之一(BALiBASE 2.0)进行了测试。结果表明,即使在复杂序列的情况下,所提出的技术也具有优越性。
RBT-Km: K-Means clustering for Multiple Sequence Alignment
This paper presents a novel approach for solving the Multiple Sequence Alignment (MSA) problem. K-Means clustering is combined with the Rubber Band Technique (RBT) to introduce an iterative optimization algorithm, namely RBT-Km, to find the optimal alignment for a set of input protein sequences. In this technique, the MSA problem is modeled as a Rubber Band, while the solution space is modeled as plate with several poles corresponding locations in the input sequences that are most likely to be correlated and/or biologically related. K-Means clustering is then used to discriminate biologically related locations from those that may appear by chance. RBT-Km is tested with one of the well-known benchmarks in this field (BALiBASE 2.0). The results demonstrate the superiority of the proposed technique even in the case of formidable sequences.