{"title":"RBT-Km: K-Means clustering for Multiple Sequence Alignment","authors":"J. Taheri, Albert Y. Zomaya","doi":"10.1109/AICCSA.2010.5586934","DOIUrl":null,"url":null,"abstract":"This paper presents a novel approach for solving the Multiple Sequence Alignment (MSA) problem. K-Means clustering is combined with the Rubber Band Technique (RBT) to introduce an iterative optimization algorithm, namely RBT-Km, to find the optimal alignment for a set of input protein sequences. In this technique, the MSA problem is modeled as a Rubber Band, while the solution space is modeled as plate with several poles corresponding locations in the input sequences that are most likely to be correlated and/or biologically related. K-Means clustering is then used to discriminate biologically related locations from those that may appear by chance. RBT-Km is tested with one of the well-known benchmarks in this field (BALiBASE 2.0). The results demonstrate the superiority of the proposed technique even in the case of formidable sequences.","PeriodicalId":352946,"journal":{"name":"ACS/IEEE International Conference on Computer Systems and Applications - AICCSA 2010","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS/IEEE International Conference on Computer Systems and Applications - AICCSA 2010","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICCSA.2010.5586934","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
This paper presents a novel approach for solving the Multiple Sequence Alignment (MSA) problem. K-Means clustering is combined with the Rubber Band Technique (RBT) to introduce an iterative optimization algorithm, namely RBT-Km, to find the optimal alignment for a set of input protein sequences. In this technique, the MSA problem is modeled as a Rubber Band, while the solution space is modeled as plate with several poles corresponding locations in the input sequences that are most likely to be correlated and/or biologically related. K-Means clustering is then used to discriminate biologically related locations from those that may appear by chance. RBT-Km is tested with one of the well-known benchmarks in this field (BALiBASE 2.0). The results demonstrate the superiority of the proposed technique even in the case of formidable sequences.