快速灵敏的est与人类基因组比对算法

Jun Ogasawara, S. Morishita
{"title":"快速灵敏的est与人类基因组比对算法","authors":"Jun Ogasawara, S. Morishita","doi":"10.1109/CSB.2002.1039328","DOIUrl":null,"url":null,"abstract":"There is a pressing need to align growing set of expressed sequence tags (ESTs) to newly sequenced human genome. The problem is, however, complicated by the exon/intron structure of eucaryotic genes, misread nucleotides in ESTs, and millions of repetitive sequences in genomic sequences. Indeed, to solve this, algorithms that use dynamic programming (DP) have been proposed, but in reality, these algorithms require an enormous amount of processing time. In an effort to improve the computational efficiency of these classical DP algorithms, we develop a software that fully utilizes the lookup-table for allowing the efficient detection of the start- and end-points of an EST within a given DNA sequence, and subsequently, the prompt identification of exons and introns. In addition, high sensitivity and accuracy must be achieved by calculating locations of all spliced sites correctly for more ESTs while retaining high computational efficiency. This goal is hard to accomplish in practice, owing to misread nucleotides in ESTs and repetitive sequences in the genome, but we present a couple of heuristics effective in settling this issue. Experimental results confirm that our technique improves the overall computation time by orders of magnitude compared with common tools such as sim4 and BLAT and attains high sensitivity and accuracy against datasets of clean and documented genes at the same time.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"43-53"},"PeriodicalIF":0.0000,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039328","citationCount":"7","resultStr":"{\"title\":\"Fast and sensitive algorithm for aligning ESTs to human genome\",\"authors\":\"Jun Ogasawara, S. Morishita\",\"doi\":\"10.1109/CSB.2002.1039328\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There is a pressing need to align growing set of expressed sequence tags (ESTs) to newly sequenced human genome. The problem is, however, complicated by the exon/intron structure of eucaryotic genes, misread nucleotides in ESTs, and millions of repetitive sequences in genomic sequences. Indeed, to solve this, algorithms that use dynamic programming (DP) have been proposed, but in reality, these algorithms require an enormous amount of processing time. In an effort to improve the computational efficiency of these classical DP algorithms, we develop a software that fully utilizes the lookup-table for allowing the efficient detection of the start- and end-points of an EST within a given DNA sequence, and subsequently, the prompt identification of exons and introns. In addition, high sensitivity and accuracy must be achieved by calculating locations of all spliced sites correctly for more ESTs while retaining high computational efficiency. This goal is hard to accomplish in practice, owing to misread nucleotides in ESTs and repetitive sequences in the genome, but we present a couple of heuristics effective in settling this issue. Experimental results confirm that our technique improves the overall computation time by orders of magnitude compared with common tools such as sim4 and BLAT and attains high sensitivity and accuracy against datasets of clean and documented genes at the same time.\",\"PeriodicalId\":87204,\"journal\":{\"name\":\"Proceedings. IEEE Computer Society Bioinformatics Conference\",\"volume\":\"1 1\",\"pages\":\"43-53\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-08-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/CSB.2002.1039328\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. IEEE Computer Society Bioinformatics Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSB.2002.1039328\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE Computer Society Bioinformatics Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSB.2002.1039328","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

迫切需要将生长的表达序列标签(est)与新测序的人类基因组对齐。然而,真核生物基因的外显子/内含子结构、ESTs中误读的核苷酸以及基因组序列中数百万重复序列使问题变得更加复杂。实际上,为了解决这个问题,已经提出了使用动态规划(DP)的算法,但实际上,这些算法需要大量的处理时间。为了提高这些经典DP算法的计算效率,我们开发了一个软件,该软件充分利用查找表,允许在给定DNA序列中有效检测EST的起始点和终点,随后,及时识别外显子和内含子。此外,为了获得更多的est,在保持较高的计算效率的同时,必须正确计算所有剪接位点的位置,以达到较高的灵敏度和准确性。由于est中的核苷酸误读和基因组中的重复序列,这一目标在实践中很难实现,但我们提出了一些有效解决这一问题的启发式方法。实验结果证实,与sim4和BLAT等常用工具相比,我们的技术将总体计算时间提高了几个数量级,同时对干净和记录的基因数据集达到了很高的灵敏度和准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Fast and sensitive algorithm for aligning ESTs to human genome
There is a pressing need to align growing set of expressed sequence tags (ESTs) to newly sequenced human genome. The problem is, however, complicated by the exon/intron structure of eucaryotic genes, misread nucleotides in ESTs, and millions of repetitive sequences in genomic sequences. Indeed, to solve this, algorithms that use dynamic programming (DP) have been proposed, but in reality, these algorithms require an enormous amount of processing time. In an effort to improve the computational efficiency of these classical DP algorithms, we develop a software that fully utilizes the lookup-table for allowing the efficient detection of the start- and end-points of an EST within a given DNA sequence, and subsequently, the prompt identification of exons and introns. In addition, high sensitivity and accuracy must be achieved by calculating locations of all spliced sites correctly for more ESTs while retaining high computational efficiency. This goal is hard to accomplish in practice, owing to misread nucleotides in ESTs and repetitive sequences in the genome, but we present a couple of heuristics effective in settling this issue. Experimental results confirm that our technique improves the overall computation time by orders of magnitude compared with common tools such as sim4 and BLAT and attains high sensitivity and accuracy against datasets of clean and documented genes at the same time.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信