{"title":"Fuzzy Classification of Genome Sequences Prior to Assembly Based on Similarity Measures","authors":"S. Nasser, G. Vert, A. Breland, M. Nicolescu","doi":"10.1109/NAFIPS.2007.383864","DOIUrl":null,"url":null,"abstract":"Nucleotide sequencing of genomic data is an important step towards building understanding of gene expression. Current limitations in sequencing limit the number of base pairs that can be processed to only several hundred at a time. Consequently, these sequenced substrings need to be assembled into the overall genome. However, the existence of insertions, deletions and substitutions can complicate the assembly of subsequences and confuse existing methods. What has been needed is an approach that deals with ambiguity in trying to match and assemble a genome from its sequenced subsequences. This research develops fuzzy similarity measures between subsequences that are then incorporated into an assembler based on fuzzy logic and fuzzy similarity measures. The research addresses the problem of extensive computation required by clustering data into meaningful groups. Preliminary evaluation of this approach in conjunction with K-Means clustering suggests that this approach is at least as good as standard approaches and in some cases better.","PeriodicalId":292853,"journal":{"name":"NAFIPS 2007 - 2007 Annual Meeting of the North American Fuzzy Information Processing Society","volume":"106 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NAFIPS 2007 - 2007 Annual Meeting of the North American Fuzzy Information Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NAFIPS.2007.383864","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Nucleotide sequencing of genomic data is an important step towards building understanding of gene expression. Current limitations in sequencing limit the number of base pairs that can be processed to only several hundred at a time. Consequently, these sequenced substrings need to be assembled into the overall genome. However, the existence of insertions, deletions and substitutions can complicate the assembly of subsequences and confuse existing methods. What has been needed is an approach that deals with ambiguity in trying to match and assemble a genome from its sequenced subsequences. This research develops fuzzy similarity measures between subsequences that are then incorporated into an assembler based on fuzzy logic and fuzzy similarity measures. The research addresses the problem of extensive computation required by clustering data into meaningful groups. Preliminary evaluation of this approach in conjunction with K-Means clustering suggests that this approach is at least as good as standard approaches and in some cases better.