Identification and Quantification of Genomic Repeats and Sample Contamination in Assemblies of 454 Pyrosequencing Reads

A. Nederbragt, T. Rounge, Kyrre Kausrud, K. Jakobsen
{"title":"Identification and Quantification of Genomic Repeats and Sample Contamination in Assemblies of 454 Pyrosequencing Reads","authors":"A. Nederbragt, T. Rounge, Kyrre Kausrud, K. Jakobsen","doi":"10.1155/2010/782465","DOIUrl":null,"url":null,"abstract":"Contigs assembled from 454 reads from bacterial genomes demonstrate a range of read depths, with a number of contigs having a depth that is far higher than can be expected. For reference genome sequence datasets, there exists a high correlation between the contig specific read depth and the number of copies present in the genome. We developed a sequence of applied statistical analyses, which suggest that the number of copies present can be reliably estimated based on the read depth distribution in de novo genome assemblies. Read depths of contigs of de novo cyanobacterial genome assemblies were determined, and several high read depth contigs were identified. These contigs were shown to mainly contain genes that are known to be present in multiple copies in bacterial genomes. For these assemblies, a correlation between read depth and copy number was experimentally demonstrated using real-time PCR. Copy number estimates, obtained using the statistical analysis developed in this work, are presented. Per-contig read depth analysis of assemblies based on 454 reads therefore enables de novo detection of genomic repeats and estimation of the copy number of these repeats. \nAdditionally, our analysis efficiently identified contigs stemming from sample contamination, allowing for their removal from the assembly.","PeriodicalId":90934,"journal":{"name":"Next generation, sequencing & applications","volume":"188 1","pages":"1-12"},"PeriodicalIF":0.0000,"publicationDate":"2010-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Next generation, sequencing & applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2010/782465","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

Abstract

Contigs assembled from 454 reads from bacterial genomes demonstrate a range of read depths, with a number of contigs having a depth that is far higher than can be expected. For reference genome sequence datasets, there exists a high correlation between the contig specific read depth and the number of copies present in the genome. We developed a sequence of applied statistical analyses, which suggest that the number of copies present can be reliably estimated based on the read depth distribution in de novo genome assemblies. Read depths of contigs of de novo cyanobacterial genome assemblies were determined, and several high read depth contigs were identified. These contigs were shown to mainly contain genes that are known to be present in multiple copies in bacterial genomes. For these assemblies, a correlation between read depth and copy number was experimentally demonstrated using real-time PCR. Copy number estimates, obtained using the statistical analysis developed in this work, are presented. Per-contig read depth analysis of assemblies based on 454 reads therefore enables de novo detection of genomic repeats and estimation of the copy number of these repeats. Additionally, our analysis efficiently identified contigs stemming from sample contamination, allowing for their removal from the assembly.
454个焦磷酸测序Reads序列中基因组重复序列的鉴定和定量及样品污染
从细菌基因组的454个reads中组装的Contigs显示出一系列的读取深度,其中许多Contigs的深度远远高于预期。对于参考基因组序列数据集,基因组特定读取深度与基因组中存在的拷贝数之间存在高度相关。我们开发了一系列应用统计分析,表明可以根据从头基因组组装的读取深度分布可靠地估计存在的拷贝数。对新生蓝藻基因组序列进行了深度分析,发现了几个高深度的基因组序列。研究表明,这些contigs主要包含已知存在于细菌基因组多个拷贝中的基因。对于这些组合,读取深度和拷贝数之间的相关性通过实时PCR实验证明。本文介绍了使用本工作中开发的统计分析获得的拷贝数估计。因此,基于454个reads的序列的每组读深度分析能够从头检测基因组重复序列并估计这些重复序列的拷贝数。此外,我们的分析有效地识别了来自样品污染的组件,允许它们从组件中移除。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信