Flowtigs: Safety in flow decompositions for assembly graphs

IF 4.6 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES
Francisco Sena , Eliel Ingervo , Shahbaz Khan , Andrey Prjibelski , Sebastian Schmidt , Alexandru Tomescu
{"title":"Flowtigs: Safety in flow decompositions for assembly graphs","authors":"Francisco Sena ,&nbsp;Eliel Ingervo ,&nbsp;Shahbaz Khan ,&nbsp;Andrey Prjibelski ,&nbsp;Sebastian Schmidt ,&nbsp;Alexandru Tomescu","doi":"10.1016/j.isci.2024.111208","DOIUrl":null,"url":null,"abstract":"<div><div>A <em>decomposition</em> of a network flow is a set of weighted walks whose superposition equals the flow. In this article, we give a simple and linear-time-verifiable complete characterization (<em>flowtigs</em>) of walks that are <em>safe</em> in such general flow decompositions, i.e., that are subwalks of any possible flow decomposition. We provide an <em>O</em>(<em>mn</em>)-time algorithm that identifies all maximal flowtigs and represents them inside a compact structure. On the practical side, we study flowtigs in the use-case of metagenomic assembly. By using the species abundances as flow values of the metagenomic assembly graph, we can model the possible assembly solutions as flow decompositions into weighted closed walks. On simulated data, compared to reporting unitigs or maximal safe walks based only on the graph structure, reporting flowtigs results in a notably more contiguous assembly. On real data, we frame flowtigs as a heuristic and provide an algorithm that is guided by this heuristic.</div></div>","PeriodicalId":342,"journal":{"name":"iScience","volume":"27 12","pages":"Article 111208"},"PeriodicalIF":4.6000,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"iScience","FirstCategoryId":"103","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589004224024337","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

A decomposition of a network flow is a set of weighted walks whose superposition equals the flow. In this article, we give a simple and linear-time-verifiable complete characterization (flowtigs) of walks that are safe in such general flow decompositions, i.e., that are subwalks of any possible flow decomposition. We provide an O(mn)-time algorithm that identifies all maximal flowtigs and represents them inside a compact structure. On the practical side, we study flowtigs in the use-case of metagenomic assembly. By using the species abundances as flow values of the metagenomic assembly graph, we can model the possible assembly solutions as flow decompositions into weighted closed walks. On simulated data, compared to reporting unitigs or maximal safe walks based only on the graph structure, reporting flowtigs results in a notably more contiguous assembly. On real data, we frame flowtigs as a heuristic and provide an algorithm that is guided by this heuristic.

Abstract Image

Flowtigs:装配图流量分解的安全性
网络流的分解是一组加权行走,其叠加等于流。在本文中,我们给出了一个简单且可线性时间验证的完整描述(flowtigs),即在这种一般流分解中安全的行走,也就是任何可能的流分解的子行走。我们提供了一种 O(mn)-time 算法,它能识别所有最大 flowtigs 并将其表示在一个紧凑的结构中。在实际应用方面,我们研究了元基因组组装中的 flowtigs。通过使用物种丰度作为元基因组组装图的流值,我们可以将可能的组装方案建模为加权封闭行走的流分解。在模拟数据上,与仅根据图结构报告单元图或最大安全走行相比,报告 flowtigs 可明显提高组装的连续性。在真实数据上,我们将 flowtigs 定义为一种启发式,并提供了一种以这种启发式为指导的算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
iScience
iScience Multidisciplinary-Multidisciplinary
CiteScore
7.20
自引率
1.70%
发文量
1972
审稿时长
6 weeks
期刊介绍: Science has many big remaining questions. To address them, we will need to work collaboratively and across disciplines. The goal of iScience is to help fuel that type of interdisciplinary thinking. iScience is a new open-access journal from Cell Press that provides a platform for original research in the life, physical, and earth sciences. The primary criterion for publication in iScience is a significant contribution to a relevant field combined with robust results and underlying methodology. The advances appearing in iScience include both fundamental and applied investigations across this interdisciplinary range of topic areas. To support transparency in scientific investigation, we are happy to consider replication studies and papers that describe negative results. We know you want your work to be published quickly and to be widely visible within your community and beyond. With the strong international reputation of Cell Press behind it, publication in iScience will help your work garner the attention and recognition it merits. Like all Cell Press journals, iScience prioritizes rapid publication. Our editorial team pays special attention to high-quality author service and to efficient, clear-cut decisions based on the information available within the manuscript. iScience taps into the expertise across Cell Press journals and selected partners to inform our editorial decisions and help publish your science in a timely and seamless way.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信