Fast and sensitive algorithm for aligning ESTs to human genome

Proceedings. IEEE Computer Society Bioinformatics Conference Pub Date : 2002-08-14 DOI:10.1109/CSB.2002.1039328

Jun Ogasawara, S. Morishita

{"title":"Fast and sensitive algorithm for aligning ESTs to human genome","authors":"Jun Ogasawara, S. Morishita","doi":"10.1109/CSB.2002.1039328","DOIUrl":null,"url":null,"abstract":"There is a pressing need to align growing set of expressed sequence tags (ESTs) to newly sequenced human genome. The problem is, however, complicated by the exon/intron structure of eucaryotic genes, misread nucleotides in ESTs, and millions of repetitive sequences in genomic sequences. Indeed, to solve this, algorithms that use dynamic programming (DP) have been proposed, but in reality, these algorithms require an enormous amount of processing time. In an effort to improve the computational efficiency of these classical DP algorithms, we develop a software that fully utilizes the lookup-table for allowing the efficient detection of the start- and end-points of an EST within a given DNA sequence, and subsequently, the prompt identification of exons and introns. In addition, high sensitivity and accuracy must be achieved by calculating locations of all spliced sites correctly for more ESTs while retaining high computational efficiency. This goal is hard to accomplish in practice, owing to misread nucleotides in ESTs and repetitive sequences in the genome, but we present a couple of heuristics effective in settling this issue. Experimental results confirm that our technique improves the overall computation time by orders of magnitude compared with common tools such as sim4 and BLAT and attains high sensitivity and accuracy against datasets of clean and documented genes at the same time.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"43-53"},"PeriodicalIF":0.0000,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039328","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE Computer Society Bioinformatics Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSB.2002.1039328","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

There is a pressing need to align growing set of expressed sequence tags (ESTs) to newly sequenced human genome. The problem is, however, complicated by the exon/intron structure of eucaryotic genes, misread nucleotides in ESTs, and millions of repetitive sequences in genomic sequences. Indeed, to solve this, algorithms that use dynamic programming (DP) have been proposed, but in reality, these algorithms require an enormous amount of processing time. In an effort to improve the computational efficiency of these classical DP algorithms, we develop a software that fully utilizes the lookup-table for allowing the efficient detection of the start- and end-points of an EST within a given DNA sequence, and subsequently, the prompt identification of exons and introns. In addition, high sensitivity and accuracy must be achieved by calculating locations of all spliced sites correctly for more ESTs while retaining high computational efficiency. This goal is hard to accomplish in practice, owing to misread nucleotides in ESTs and repetitive sequences in the genome, but we present a couple of heuristics effective in settling this issue. Experimental results confirm that our technique improves the overall computation time by orders of magnitude compared with common tools such as sim4 and BLAT and attains high sensitivity and accuracy against datasets of clean and documented genes at the same time.

查看原文本刊更多论文

快速灵敏的est与人类基因组比对算法

迫切需要将生长的表达序列标签(est)与新测序的人类基因组对齐。然而，真核生物基因的外显子/内含子结构、ESTs中误读的核苷酸以及基因组序列中数百万重复序列使问题变得更加复杂。实际上，为了解决这个问题，已经提出了使用动态规划(DP)的算法，但实际上，这些算法需要大量的处理时间。为了提高这些经典DP算法的计算效率，我们开发了一个软件，该软件充分利用查找表，允许在给定DNA序列中有效检测EST的起始点和终点，随后，及时识别外显子和内含子。此外，为了获得更多的est，在保持较高的计算效率的同时，必须正确计算所有剪接位点的位置，以达到较高的灵敏度和准确性。由于est中的核苷酸误读和基因组中的重复序列，这一目标在实践中很难实现，但我们提出了一些有效解决这一问题的启发式方法。实验结果证实，与sim4和BLAT等常用工具相比，我们的技术将总体计算时间提高了几个数量级，同时对干净和记录的基因数据集达到了很高的灵敏度和准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. IEEE Computer Society Bioinformatics Conference

自引率

0.00%

发文量