A New Method for Mapping Short DNA Sequencing Reads by Using Quality Scores

H. Ozer, Terry Camerlengo, T. Huang, Kun Huang
{"title":"A New Method for Mapping Short DNA Sequencing Reads by Using Quality Scores","authors":"H. Ozer, Terry Camerlengo, T. Huang, Kun Huang","doi":"10.1109/OCCBIO.2009.35","DOIUrl":null,"url":null,"abstract":"New high-throughput sequencing technologies can generate millions of short DNA sequences that need to be mapped to the reference genome accurately. Majority of the mapping algorithms handle variations in the quality of these short sequences by allowing more mismatches and/or gaps in the alignment and focus to improve runtime. In this paper, weinvestigate ways to classify quality scores of short DNA sequencing reads and integrate them into the mapping process. We specifically studied the quality scores that suggest two alternate bases (the top quality scores for two bases are close to each other at the locus) and use of such bases to improve mapping accuracy.Our method includes generation of alternative sequences when there are alternate-quality bases in a sequence read and mapping of these alternative sequences to the reference genome. In a test using a piece of ChIP-seq data from epigenetic study, we generated and mapped alternatives of 222,755 sequence reads (out of the original 2.5 million reads) that cannot be mapped to the reference genome by the Eland algorithm. With this approach we could be able to map 12.8% of these sequence reads with alternative bases to unique positions in the genome. In this study, we demonstrate that use of alternative bases in mapping algorithms can improve mapping results dramatically.","PeriodicalId":231499,"journal":{"name":"2009 Ohio Collaborative Conference on Bioinformatics","volume":"718 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Ohio Collaborative Conference on Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/OCCBIO.2009.35","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

New high-throughput sequencing technologies can generate millions of short DNA sequences that need to be mapped to the reference genome accurately. Majority of the mapping algorithms handle variations in the quality of these short sequences by allowing more mismatches and/or gaps in the alignment and focus to improve runtime. In this paper, weinvestigate ways to classify quality scores of short DNA sequencing reads and integrate them into the mapping process. We specifically studied the quality scores that suggest two alternate bases (the top quality scores for two bases are close to each other at the locus) and use of such bases to improve mapping accuracy.Our method includes generation of alternative sequences when there are alternate-quality bases in a sequence read and mapping of these alternative sequences to the reference genome. In a test using a piece of ChIP-seq data from epigenetic study, we generated and mapped alternatives of 222,755 sequence reads (out of the original 2.5 million reads) that cannot be mapped to the reference genome by the Eland algorithm. With this approach we could be able to map 12.8% of these sequence reads with alternative bases to unique positions in the genome. In this study, we demonstrate that use of alternative bases in mapping algorithms can improve mapping results dramatically.
一种利用质量分数定位DNA短序列的新方法
新的高通量测序技术可以产生数百万个需要精确映射到参考基因组的短DNA序列。大多数映射算法通过允许更多的不匹配和/或对齐中的间隙来处理这些短序列的质量变化,从而提高运行时间。在本文中,我们研究了对短DNA测序读数的质量分数进行分类并将其整合到制图过程中的方法。我们特别研究了建议两个备选碱基的质量分数(两个碱基的最高质量分数在位点上彼此接近),并使用这些碱基来提高制图精度。我们的方法包括当序列读取中存在替代质量碱基时生成替代序列,并将这些替代序列映射到参考基因组。在一项使用来自表观遗传学研究的ChIP-seq数据的测试中,我们生成并绘制了222,755个序列reads(来自原始的250万reads)的替代序列,这些序列reads不能通过Eland算法映射到参考基因组。通过这种方法,我们可以将12.8%的序列读数与基因组中独特的碱基相匹配。在这项研究中,我们证明了在映射算法中使用替代碱基可以显着改善映射结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信