链霉菌基因组DNA序列的去污,以获得最佳的基因组挖掘。

IF 2.1 4区 生物学 Q3 MICROBIOLOGY
Raul Vitor Ferreira de Oliveira, Leandro Maza Garrido, Gabriel Padilla
{"title":"链霉菌基因组DNA序列的去污,以获得最佳的基因组挖掘。","authors":"Raul Vitor Ferreira de Oliveira, Leandro Maza Garrido, Gabriel Padilla","doi":"10.1007/s42770-024-01598-2","DOIUrl":null,"url":null,"abstract":"<p><p>Despite meticulous precautions, contamination of genomic DNA samples is not uncommon, which can significantly compromise the analysis of microorganisms' whole-genome sequencing data, thus affecting all subsequent analyses. Thanks to advancements in software and bioinformatics techniques, it is now possible to address this issue and prevent the loss of the entire dataset obtained in a contaminated whole-genome sequencing, where the DNA of another bacterium is present. In this study, it was observed that the sequencing reads from Streptomyces sp. BRB040, generated using the HiSeq System platform (Illumina Inc., San Diego, USA), were contaminated with the DNA of Bacillus licheniformis. To eliminate the contamination in Streptomyces sp. BRB040, a combination of tools available on the Galaxy platform and other web-based resources were used (MeDuSa and Blast). The contaminated reads were treated as a metagenome to isolate the genome of the contaminating organism. They were assembled using the metaSPAdes, resulting in a large scaffold of 4.187 Mb, which was identified as Bacillus licheniformis. After the identification of the contaminating organism, its genome was used as a filter to remove sequencing reads that could align using then Bowtie 2 software for this step. Once the contaminated reads were removed a new assembly was performed using the Unicycler software, yielding 117 contigs with a total size of 7.9 Mb. The completeness of this genome was assessed through BUSCO, resulting in a completeness of 95.9%. We also used an alternative tool (BBduk) to eliminate contaminated reads and the resulting assembly by Unicycler generated 85 contigs with a total size of 8.3 Mb and completeness of 99.5%. These results were better than the assembly obtained via SPAdes, which generated less complete genomes (maximum of 97.8% completeness) compared to Unicycler and which was unable to perform an adequate assembly of the data obtained from decontamination by BBduk. When compared with the uncontaminated BRB040 genome, which has a total size of 8.2 Mb and completeness of 99.8%, this pipeline revealed that the assembly performed with the decontaminated reads via BBduk presented better results, with completeness 0.3% lower than the reference. The genome mining of both genomes using antiSMASH 7.0 revealed the number of 24 Biosynthetic Gene Clusters (BGCs) for BBduk data as well as in the control assembly of the BRB040. In silico decontamination process allows the genome mining of BGCs despite the loss of nucleotides. These findings show that contamination can be effectively removed from a genome using readily available online tools, while preserving a dataset suitable for extracting valuable insights into the secondary metabolism of the target organism. This approach is particularly beneficial in scenarios where resequencing samples is not immediately feasible.</p>","PeriodicalId":9090,"journal":{"name":"Brazilian Journal of Microbiology","volume":" ","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Decontamination of DNA sequences from a Streptomyces genome for optimal genome mining.\",\"authors\":\"Raul Vitor Ferreira de Oliveira, Leandro Maza Garrido, Gabriel Padilla\",\"doi\":\"10.1007/s42770-024-01598-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Despite meticulous precautions, contamination of genomic DNA samples is not uncommon, which can significantly compromise the analysis of microorganisms' whole-genome sequencing data, thus affecting all subsequent analyses. Thanks to advancements in software and bioinformatics techniques, it is now possible to address this issue and prevent the loss of the entire dataset obtained in a contaminated whole-genome sequencing, where the DNA of another bacterium is present. In this study, it was observed that the sequencing reads from Streptomyces sp. BRB040, generated using the HiSeq System platform (Illumina Inc., San Diego, USA), were contaminated with the DNA of Bacillus licheniformis. To eliminate the contamination in Streptomyces sp. BRB040, a combination of tools available on the Galaxy platform and other web-based resources were used (MeDuSa and Blast). The contaminated reads were treated as a metagenome to isolate the genome of the contaminating organism. They were assembled using the metaSPAdes, resulting in a large scaffold of 4.187 Mb, which was identified as Bacillus licheniformis. After the identification of the contaminating organism, its genome was used as a filter to remove sequencing reads that could align using then Bowtie 2 software for this step. Once the contaminated reads were removed a new assembly was performed using the Unicycler software, yielding 117 contigs with a total size of 7.9 Mb. The completeness of this genome was assessed through BUSCO, resulting in a completeness of 95.9%. We also used an alternative tool (BBduk) to eliminate contaminated reads and the resulting assembly by Unicycler generated 85 contigs with a total size of 8.3 Mb and completeness of 99.5%. These results were better than the assembly obtained via SPAdes, which generated less complete genomes (maximum of 97.8% completeness) compared to Unicycler and which was unable to perform an adequate assembly of the data obtained from decontamination by BBduk. When compared with the uncontaminated BRB040 genome, which has a total size of 8.2 Mb and completeness of 99.8%, this pipeline revealed that the assembly performed with the decontaminated reads via BBduk presented better results, with completeness 0.3% lower than the reference. The genome mining of both genomes using antiSMASH 7.0 revealed the number of 24 Biosynthetic Gene Clusters (BGCs) for BBduk data as well as in the control assembly of the BRB040. In silico decontamination process allows the genome mining of BGCs despite the loss of nucleotides. These findings show that contamination can be effectively removed from a genome using readily available online tools, while preserving a dataset suitable for extracting valuable insights into the secondary metabolism of the target organism. This approach is particularly beneficial in scenarios where resequencing samples is not immediately feasible.</p>\",\"PeriodicalId\":9090,\"journal\":{\"name\":\"Brazilian Journal of Microbiology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-01-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Brazilian Journal of Microbiology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1007/s42770-024-01598-2\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Brazilian Journal of Microbiology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s42770-024-01598-2","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

尽管采取了细致的预防措施,但基因组DNA样本的污染并不罕见,这可能严重影响微生物全基因组测序数据的分析,从而影响所有后续分析。由于软件和生物信息学技术的进步,现在有可能解决这个问题,并防止在污染的全基因组测序中获得的整个数据集的丢失,其中存在另一种细菌的DNA。本研究发现,使用HiSeq System平台(Illumina Inc., San Diego, USA)生成的Streptomyces sp. BRB040测序结果被地衣芽孢杆菌DNA污染。为了消除Streptomyces sp. BRB040中的污染,使用了Galaxy平台和其他网络资源(MeDuSa和Blast)上可用的工具组合。将受污染的reads作为宏基因组处理,以分离受污染生物体的基因组。利用metaSPAdes将它们组装在一起,得到一个4.187 Mb的大支架,经鉴定为地衣芽孢杆菌。在鉴定出污染生物体后,将其基因组用作过滤器,使用Bowtie 2软件去除该步骤中可以对齐的测序读数。一旦被污染的reads被移除,使用Unicycler软件进行新的组装,得到117个contigs,总大小为7.9 Mb。通过BUSCO评估该基因组的完整性,完整性为95.9%。我们还使用另一种工具(BBduk)来消除受污染的读段,结果由Unicycler生成85个contigs,总大小为8.3 Mb,完整性为99.5%。这些结果优于通过SPAdes获得的组装,与Unicycler相比,SPAdes产生的基因组完整性较低(最大完整性为97.8%),并且无法对从BBduk净化中获得的数据进行充分的组装。与未污染的BRB040基因组(总大小为8.2 Mb,完整性为99.8%)相比,该管道显示,通过BBduk与未污染的reads进行组装的结果更好,完整性比参考文献低0.3%。使用antiSMASH 7.0对两个基因组进行基因组挖掘,发现BBduk数据和BRB040的对照组装中有24个生物合成基因簇(bgc)。在硅去污过程中,尽管核苷酸丢失,但仍可以对bgc进行基因组挖掘。这些发现表明,使用现成的在线工具可以有效地从基因组中去除污染,同时保留适合提取目标生物体次级代谢的有价值见解的数据集。这种方法在重新测序样品不能立即实现的情况下特别有益。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Decontamination of DNA sequences from a Streptomyces genome for optimal genome mining.

Despite meticulous precautions, contamination of genomic DNA samples is not uncommon, which can significantly compromise the analysis of microorganisms' whole-genome sequencing data, thus affecting all subsequent analyses. Thanks to advancements in software and bioinformatics techniques, it is now possible to address this issue and prevent the loss of the entire dataset obtained in a contaminated whole-genome sequencing, where the DNA of another bacterium is present. In this study, it was observed that the sequencing reads from Streptomyces sp. BRB040, generated using the HiSeq System platform (Illumina Inc., San Diego, USA), were contaminated with the DNA of Bacillus licheniformis. To eliminate the contamination in Streptomyces sp. BRB040, a combination of tools available on the Galaxy platform and other web-based resources were used (MeDuSa and Blast). The contaminated reads were treated as a metagenome to isolate the genome of the contaminating organism. They were assembled using the metaSPAdes, resulting in a large scaffold of 4.187 Mb, which was identified as Bacillus licheniformis. After the identification of the contaminating organism, its genome was used as a filter to remove sequencing reads that could align using then Bowtie 2 software for this step. Once the contaminated reads were removed a new assembly was performed using the Unicycler software, yielding 117 contigs with a total size of 7.9 Mb. The completeness of this genome was assessed through BUSCO, resulting in a completeness of 95.9%. We also used an alternative tool (BBduk) to eliminate contaminated reads and the resulting assembly by Unicycler generated 85 contigs with a total size of 8.3 Mb and completeness of 99.5%. These results were better than the assembly obtained via SPAdes, which generated less complete genomes (maximum of 97.8% completeness) compared to Unicycler and which was unable to perform an adequate assembly of the data obtained from decontamination by BBduk. When compared with the uncontaminated BRB040 genome, which has a total size of 8.2 Mb and completeness of 99.8%, this pipeline revealed that the assembly performed with the decontaminated reads via BBduk presented better results, with completeness 0.3% lower than the reference. The genome mining of both genomes using antiSMASH 7.0 revealed the number of 24 Biosynthetic Gene Clusters (BGCs) for BBduk data as well as in the control assembly of the BRB040. In silico decontamination process allows the genome mining of BGCs despite the loss of nucleotides. These findings show that contamination can be effectively removed from a genome using readily available online tools, while preserving a dataset suitable for extracting valuable insights into the secondary metabolism of the target organism. This approach is particularly beneficial in scenarios where resequencing samples is not immediately feasible.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Brazilian Journal of Microbiology
Brazilian Journal of Microbiology 生物-微生物学
CiteScore
4.10
自引率
4.50%
发文量
216
审稿时长
1.0 months
期刊介绍: The Brazilian Journal of Microbiology is an international peer reviewed journal that covers a wide-range of research on fundamental and applied aspects of microbiology. The journal considers for publication original research articles, short communications, reviews, and letters to the editor, that may be submitted to the following sections: Biotechnology and Industrial Microbiology, Food Microbiology, Bacterial and Fungal Pathogenesis, Clinical Microbiology, Environmental Microbiology, Veterinary Microbiology, Fungal and Bacterial Physiology, Bacterial, Fungal and Virus Molecular Biology, Education in Microbiology. For more details on each section, please check out the instructions for authors. The journal is the official publication of the Brazilian Society of Microbiology and currently publishes 4 issues per year.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信