An efficient overlap graph coarsening approach for modeling short reads

Julia D. Warnke-Sommer, H. Ali
{"title":"An efficient overlap graph coarsening approach for modeling short reads","authors":"Julia D. Warnke-Sommer, H. Ali","doi":"10.1109/BIBMW.2012.6470223","DOIUrl":null,"url":null,"abstract":"Next generation sequencing has quickly emerged as the most exciting yet challenging computational problem in Bioinformatics. Current sequencing technologies are capable of producing several hundreds of thousands to several millions of short sequence reads in a single run. However, current methods for managing, storing, and processing the produced reads remain for the most part simple and lack the complexity needed to model the produced reads efficiently and assemble them correctly. These reads are produced at a high coverage of the original target sequence such that many reads overlap. The overlap relationships are used to align and merge reads into contiguous sequences called contigs. In this paper, we present an overlap graph coarsening scheme for modeling reads and their overlap relationships. Our approach is different from previous read analysis and assembly methods that use a single graph to model read overlap relationships. Instead, we use a series of graphs with different granularities of information to represent the complex read overlap relationships. We present a new graph coarsening algorithm for clustering a simulated metagenomics dataset at various levels of granularity. We also use the proposed graph coarsening scheme along with graph traversal algorithms to find a labeling of the overlap graph that allows for the efficient organization of nodes within the graph data structure.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBMW.2012.6470223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Next generation sequencing has quickly emerged as the most exciting yet challenging computational problem in Bioinformatics. Current sequencing technologies are capable of producing several hundreds of thousands to several millions of short sequence reads in a single run. However, current methods for managing, storing, and processing the produced reads remain for the most part simple and lack the complexity needed to model the produced reads efficiently and assemble them correctly. These reads are produced at a high coverage of the original target sequence such that many reads overlap. The overlap relationships are used to align and merge reads into contiguous sequences called contigs. In this paper, we present an overlap graph coarsening scheme for modeling reads and their overlap relationships. Our approach is different from previous read analysis and assembly methods that use a single graph to model read overlap relationships. Instead, we use a series of graphs with different granularities of information to represent the complex read overlap relationships. We present a new graph coarsening algorithm for clustering a simulated metagenomics dataset at various levels of granularity. We also use the proposed graph coarsening scheme along with graph traversal algorithms to find a labeling of the overlap graph that allows for the efficient organization of nodes within the graph data structure.
一种高效的重叠图粗化方法,用于短读段建模
下一代测序已迅速成为生物信息学中最令人兴奋但也最具挑战性的计算问题。目前的测序技术能够在一次运行中产生数十万到数百万个短序列读取。然而,目前用于管理、存储和处理生成的读取的方法在很大程度上仍然很简单,缺乏对生成的读取进行有效建模和正确组装所需的复杂性。这些读取是在原始目标序列的高覆盖率上产生的,因此许多读取重叠。重叠关系用于将读取对齐和合并为称为contigs的连续序列。在本文中,我们提出了一种重叠图粗化方案来建模读取及其重叠关系。我们的方法不同于以前的读取分析和组装方法,这些方法使用单个图来建模读取重叠关系。相反,我们使用一系列具有不同粒度信息的图来表示复杂的读重叠关系。我们提出了一种新的图形粗化算法,用于在不同粒度级别上聚类模拟宏基因组数据集。我们还使用提出的图粗化方案和图遍历算法来找到重叠图的标记,该标记允许在图数据结构中有效地组织节点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信