Using the longest run subsequence problem within homology-based scaffolding.

IF 1.5 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS
Sven Schrinner, Manish Goel, Michael Wulfert, Philipp Spohr, Korbinian Schneeberger, Gunnar W Klau
{"title":"Using the longest run subsequence problem within homology-based scaffolding.","authors":"Sven Schrinner,&nbsp;Manish Goel,&nbsp;Michael Wulfert,&nbsp;Philipp Spohr,&nbsp;Korbinian Schneeberger,&nbsp;Gunnar W Klau","doi":"10.1186/s13015-021-00191-8","DOIUrl":null,"url":null,"abstract":"<p><p>Genome assembly is one of the most important problems in computational genomics. Here, we suggest addressing an issue that arises in homology-based scaffolding, that is, when linking and ordering contigs to obtain larger pseudo-chromosomes by means of a second incomplete assembly of a related species. The idea is to use alignments of binned regions in one contig to find the most homologous contig in the other assembly. We show that ordering the contigs of the other assembly can be expressed by a new string problem, the longest run subsequence problem (LRS). We show that LRS is NP-hard and present reduction rules and two algorithmic approaches that, together, are able to solve large instances of LRS to provable optimality. All data used in the experiments as well as our source code are freely available. We demonstrate its usefulness within an existing larger scaffolding approach by solving realistic instances resulting from partial Arabidopsis thaliana assemblies in short computation time.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"16 1","pages":"11"},"PeriodicalIF":1.5000,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8240273/pdf/","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms for Molecular Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13015-021-00191-8","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 2

Abstract

Genome assembly is one of the most important problems in computational genomics. Here, we suggest addressing an issue that arises in homology-based scaffolding, that is, when linking and ordering contigs to obtain larger pseudo-chromosomes by means of a second incomplete assembly of a related species. The idea is to use alignments of binned regions in one contig to find the most homologous contig in the other assembly. We show that ordering the contigs of the other assembly can be expressed by a new string problem, the longest run subsequence problem (LRS). We show that LRS is NP-hard and present reduction rules and two algorithmic approaches that, together, are able to solve large instances of LRS to provable optimality. All data used in the experiments as well as our source code are freely available. We demonstrate its usefulness within an existing larger scaffolding approach by solving realistic instances resulting from partial Arabidopsis thaliana assemblies in short computation time.

Abstract Image

Abstract Image

Abstract Image

在基于同构的脚手架中使用最长运行子序列问题。
基因组组装是计算基因组学中的一个重要问题。在这里,我们建议解决在基于同源的支架中出现的问题,即当连接和排序contigs以获得更大的伪染色体时,通过相关物种的第二次不完整组装。这个想法是使用一个组合中的分框区域的比对来找到另一个组合中最同源的组合。我们证明了其他组件的排序可以用一个新的字符串问题来表示,即最长运行子序列问题(LRS)。我们证明了LRS是np困难的,并提出了约简规则和两种算法方法,它们一起能够解决大型LRS实例的可证明最优性。实验中使用的所有数据以及我们的源代码都是免费提供的。我们通过在短计算时间内解决部分拟南芥组装产生的实际实例,证明了它在现有的更大的脚手架方法中的有用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Algorithms for Molecular Biology
Algorithms for Molecular Biology 生物-生化研究方法
CiteScore
2.40
自引率
10.00%
发文量
16
审稿时长
>12 weeks
期刊介绍: Algorithms for Molecular Biology publishes articles on novel algorithms for biological sequence and structure analysis, phylogeny reconstruction, and combinatorial algorithms and machine learning. Areas of interest include but are not limited to: algorithms for RNA and protein structure analysis, gene prediction and genome analysis, comparative sequence analysis and alignment, phylogeny, gene expression, machine learning, and combinatorial algorithms. Where appropriate, manuscripts should describe applications to real-world data. However, pure algorithm papers are also welcome if future applications to biological data are to be expected, or if they address complexity or approximation issues of novel computational problems in molecular biology. Articles about novel software tools will be considered for publication if they contain some algorithmically interesting aspects.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信