Combining de novo and reference-guided assembly with scaffold_builder.

Q2 Decision Sciences
Genivaldo Gz Silva, Bas E Dutilh, T David Matthews, Keri Elkins, Robert Schmieder, Elizabeth A Dinsdale, Robert A Edwards
{"title":"Combining de novo and reference-guided assembly with scaffold_builder.","authors":"Genivaldo Gz Silva,&nbsp;Bas E Dutilh,&nbsp;T David Matthews,&nbsp;Keri Elkins,&nbsp;Robert Schmieder,&nbsp;Elizabeth A Dinsdale,&nbsp;Robert A Edwards","doi":"10.1186/1751-0473-8-23","DOIUrl":null,"url":null,"abstract":"<p><p>Genome sequencing has become routine, however genome assembly still remains a challenge despite the computational advances in the last decade. In particular, the abundance of repeat elements in genomes makes it difficult to assemble them into a single complete sequence. Identical repeats shorter than the average read length can generally be assembled without issue. However, longer repeats such as ribosomal RNA operons cannot be accurately assembled using existing tools. The application Scaffold_builder was designed to generate scaffolds - super contigs of sequences joined by N-bases - based on the similarity to a closely related reference sequence. This is independent of mate-pair information and can be used complementarily for genome assembly, e.g. when mate-pairs are not available or have already been exploited. Scaffold_builder was evaluated using simulated pyrosequencing reads of the bacterial genomes Escherichia coli 042, Lactobacillus salivarius UCC118 and Salmonella enterica subsp. enterica serovar Typhi str. P-stx-12. Moreover, we sequenced two genomes from Salmonella enterica serovar Typhimurium LT2 G455 and Salmonella enterica serovar Typhimurium SDT1291 and show that Scaffold_builder decreases the number of contig sequences by 53% while more than doubling their average length. Scaffold_builder is written in Python and is available at http://edwards.sdsu.edu/scaffold_builder. A web-based implementation is additionally provided to allow users to submit a reference genome and a set of contigs to be scaffolded. </p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"8 1","pages":"23"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-8-23","citationCount":"67","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Source Code for Biology and Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/1751-0473-8-23","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Decision Sciences","Score":null,"Total":0}
引用次数: 67

Abstract

Genome sequencing has become routine, however genome assembly still remains a challenge despite the computational advances in the last decade. In particular, the abundance of repeat elements in genomes makes it difficult to assemble them into a single complete sequence. Identical repeats shorter than the average read length can generally be assembled without issue. However, longer repeats such as ribosomal RNA operons cannot be accurately assembled using existing tools. The application Scaffold_builder was designed to generate scaffolds - super contigs of sequences joined by N-bases - based on the similarity to a closely related reference sequence. This is independent of mate-pair information and can be used complementarily for genome assembly, e.g. when mate-pairs are not available or have already been exploited. Scaffold_builder was evaluated using simulated pyrosequencing reads of the bacterial genomes Escherichia coli 042, Lactobacillus salivarius UCC118 and Salmonella enterica subsp. enterica serovar Typhi str. P-stx-12. Moreover, we sequenced two genomes from Salmonella enterica serovar Typhimurium LT2 G455 and Salmonella enterica serovar Typhimurium SDT1291 and show that Scaffold_builder decreases the number of contig sequences by 53% while more than doubling their average length. Scaffold_builder is written in Python and is available at http://edwards.sdsu.edu/scaffold_builder. A web-based implementation is additionally provided to allow users to submit a reference genome and a set of contigs to be scaffolded.

Abstract Image

结合de novo和参考引导组装与scaffold_builder。
基因组测序已经成为常规,然而基因组组装仍然是一个挑战,尽管在过去的十年中计算的进步。特别是,基因组中大量的重复元素使得很难将它们组装成一个完整的序列。比平均读取长度短的相同重复序列通常可以毫无问题地组装起来。然而,较长的重复序列,如核糖体RNA操纵子,不能使用现有的工具准确地组装。应用程序Scaffold_builder被设计用于基于与密切相关的参考序列的相似性来生成支架-由n个碱基连接的序列的超级contigs。这是独立于配偶对信息,可以互补用于基因组组装,例如,当配偶对不可用或已经被利用。通过模拟焦磷酸测序对大肠杆菌042、唾液乳杆菌UCC118和肠沙门氏菌亚种的细菌基因组进行评估。伤寒链球菌P-stx-12。此外,我们对肠沙门氏菌血清型鼠伤寒杆菌LT2 G455和肠沙门氏菌血清型鼠伤寒杆菌SDT1291的两个基因组进行了测序,发现Scaffold_builder减少了53%的序列数量,而平均长度增加了一倍以上。Scaffold_builder是用Python编写的,可以在http://edwards.sdsu.edu/scaffold_builder上获得。另外还提供了一个基于web的实现,允许用户提交参考基因组和一组要搭建的基因组。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Source Code for Biology and Medicine
Source Code for Biology and Medicine Decision Sciences-Information Systems and Management
自引率
0.00%
发文量
0
期刊介绍: Source Code for Biology and Medicine is a peer-reviewed open access, online journal that publishes articles on source code employed over a wide range of applications in biology and medicine. The journal"s aim is to publish source code for distribution and use in the public domain in order to advance biological and medical research. Through this dissemination, it may be possible to shorten the time required for solving certain computational problems for which there is limited source code availability or resources.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信