{"title":"Fast and sensitive algorithm for aligning ESTs to human genome","authors":"Jun Ogasawara, S. Morishita","doi":"10.1109/CSB.2002.1039328","DOIUrl":null,"url":null,"abstract":"There is a pressing need to align growing set of expressed sequence tags (ESTs) to newly sequenced human genome. The problem is, however, complicated by the exon/intron structure of eucaryotic genes, misread nucleotides in ESTs, and millions of repetitive sequences in genomic sequences. Indeed, to solve this, algorithms that use dynamic programming (DP) have been proposed, but in reality, these algorithms require an enormous amount of processing time. In an effort to improve the computational efficiency of these classical DP algorithms, we develop a software that fully utilizes the lookup-table for allowing the efficient detection of the start- and end-points of an EST within a given DNA sequence, and subsequently, the prompt identification of exons and introns. In addition, high sensitivity and accuracy must be achieved by calculating locations of all spliced sites correctly for more ESTs while retaining high computational efficiency. This goal is hard to accomplish in practice, owing to misread nucleotides in ESTs and repetitive sequences in the genome, but we present a couple of heuristics effective in settling this issue. Experimental results confirm that our technique improves the overall computation time by orders of magnitude compared with common tools such as sim4 and BLAT and attains high sensitivity and accuracy against datasets of clean and documented genes at the same time.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"43-53"},"PeriodicalIF":0.0000,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039328","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE Computer Society Bioinformatics Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSB.2002.1039328","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
There is a pressing need to align growing set of expressed sequence tags (ESTs) to newly sequenced human genome. The problem is, however, complicated by the exon/intron structure of eucaryotic genes, misread nucleotides in ESTs, and millions of repetitive sequences in genomic sequences. Indeed, to solve this, algorithms that use dynamic programming (DP) have been proposed, but in reality, these algorithms require an enormous amount of processing time. In an effort to improve the computational efficiency of these classical DP algorithms, we develop a software that fully utilizes the lookup-table for allowing the efficient detection of the start- and end-points of an EST within a given DNA sequence, and subsequently, the prompt identification of exons and introns. In addition, high sensitivity and accuracy must be achieved by calculating locations of all spliced sites correctly for more ESTs while retaining high computational efficiency. This goal is hard to accomplish in practice, owing to misread nucleotides in ESTs and repetitive sequences in the genome, but we present a couple of heuristics effective in settling this issue. Experimental results confirm that our technique improves the overall computation time by orders of magnitude compared with common tools such as sim4 and BLAT and attains high sensitivity and accuracy against datasets of clean and documented genes at the same time.