Lorie Dudoignon, E. Glémet, H. C. Heus, M. Raffinot
{"title":"High similarity sequence comparison in clustering large sequence databases","authors":"Lorie Dudoignon, E. Glémet, H. C. Heus, M. Raffinot","doi":"10.1109/CSB.2002.1039345","DOIUrl":null,"url":null,"abstract":"We present a fast algorithm for sequence clustering and searching which works with large sequence databases. It uses a strictly defined similarity measure. The algorithm is faster than conventional EST clustering approaches because its complexity is directly related to the number of subwords shared by the sequences. Furthermore, the algorithm also works with proteic sequences and large sequences like entire chromosomes. We present a theoretical study of our approach and provide experimental results.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"228-236"},"PeriodicalIF":0.0000,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039345","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE Computer Society Bioinformatics Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSB.2002.1039345","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
We present a fast algorithm for sequence clustering and searching which works with large sequence databases. It uses a strictly defined similarity measure. The algorithm is faster than conventional EST clustering approaches because its complexity is directly related to the number of subwords shared by the sequences. Furthermore, the algorithm also works with proteic sequences and large sequences like entire chromosomes. We present a theoretical study of our approach and provide experimental results.