Raul Vitor Ferreira de Oliveira, Leandro Maza Garrido, Gabriel Padilla
{"title":"链霉菌基因组DNA序列的去污,以获得最佳的基因组挖掘。","authors":"Raul Vitor Ferreira de Oliveira, Leandro Maza Garrido, Gabriel Padilla","doi":"10.1007/s42770-024-01598-2","DOIUrl":null,"url":null,"abstract":"<p><p>Despite meticulous precautions, contamination of genomic DNA samples is not uncommon, which can significantly compromise the analysis of microorganisms' whole-genome sequencing data, thus affecting all subsequent analyses. Thanks to advancements in software and bioinformatics techniques, it is now possible to address this issue and prevent the loss of the entire dataset obtained in a contaminated whole-genome sequencing, where the DNA of another bacterium is present. In this study, it was observed that the sequencing reads from Streptomyces sp. BRB040, generated using the HiSeq System platform (Illumina Inc., San Diego, USA), were contaminated with the DNA of Bacillus licheniformis. To eliminate the contamination in Streptomyces sp. BRB040, a combination of tools available on the Galaxy platform and other web-based resources were used (MeDuSa and Blast). The contaminated reads were treated as a metagenome to isolate the genome of the contaminating organism. They were assembled using the metaSPAdes, resulting in a large scaffold of 4.187 Mb, which was identified as Bacillus licheniformis. After the identification of the contaminating organism, its genome was used as a filter to remove sequencing reads that could align using then Bowtie 2 software for this step. Once the contaminated reads were removed a new assembly was performed using the Unicycler software, yielding 117 contigs with a total size of 7.9 Mb. The completeness of this genome was assessed through BUSCO, resulting in a completeness of 95.9%. We also used an alternative tool (BBduk) to eliminate contaminated reads and the resulting assembly by Unicycler generated 85 contigs with a total size of 8.3 Mb and completeness of 99.5%. These results were better than the assembly obtained via SPAdes, which generated less complete genomes (maximum of 97.8% completeness) compared to Unicycler and which was unable to perform an adequate assembly of the data obtained from decontamination by BBduk. When compared with the uncontaminated BRB040 genome, which has a total size of 8.2 Mb and completeness of 99.8%, this pipeline revealed that the assembly performed with the decontaminated reads via BBduk presented better results, with completeness 0.3% lower than the reference. The genome mining of both genomes using antiSMASH 7.0 revealed the number of 24 Biosynthetic Gene Clusters (BGCs) for BBduk data as well as in the control assembly of the BRB040. In silico decontamination process allows the genome mining of BGCs despite the loss of nucleotides. These findings show that contamination can be effectively removed from a genome using readily available online tools, while preserving a dataset suitable for extracting valuable insights into the secondary metabolism of the target organism. This approach is particularly beneficial in scenarios where resequencing samples is not immediately feasible.</p>","PeriodicalId":9090,"journal":{"name":"Brazilian Journal of Microbiology","volume":" ","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Decontamination of DNA sequences from a Streptomyces genome for optimal genome mining.\",\"authors\":\"Raul Vitor Ferreira de Oliveira, Leandro Maza Garrido, Gabriel Padilla\",\"doi\":\"10.1007/s42770-024-01598-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Despite meticulous precautions, contamination of genomic DNA samples is not uncommon, which can significantly compromise the analysis of microorganisms' whole-genome sequencing data, thus affecting all subsequent analyses. Thanks to advancements in software and bioinformatics techniques, it is now possible to address this issue and prevent the loss of the entire dataset obtained in a contaminated whole-genome sequencing, where the DNA of another bacterium is present. In this study, it was observed that the sequencing reads from Streptomyces sp. BRB040, generated using the HiSeq System platform (Illumina Inc., San Diego, USA), were contaminated with the DNA of Bacillus licheniformis. To eliminate the contamination in Streptomyces sp. BRB040, a combination of tools available on the Galaxy platform and other web-based resources were used (MeDuSa and Blast). The contaminated reads were treated as a metagenome to isolate the genome of the contaminating organism. They were assembled using the metaSPAdes, resulting in a large scaffold of 4.187 Mb, which was identified as Bacillus licheniformis. After the identification of the contaminating organism, its genome was used as a filter to remove sequencing reads that could align using then Bowtie 2 software for this step. Once the contaminated reads were removed a new assembly was performed using the Unicycler software, yielding 117 contigs with a total size of 7.9 Mb. The completeness of this genome was assessed through BUSCO, resulting in a completeness of 95.9%. We also used an alternative tool (BBduk) to eliminate contaminated reads and the resulting assembly by Unicycler generated 85 contigs with a total size of 8.3 Mb and completeness of 99.5%. These results were better than the assembly obtained via SPAdes, which generated less complete genomes (maximum of 97.8% completeness) compared to Unicycler and which was unable to perform an adequate assembly of the data obtained from decontamination by BBduk. When compared with the uncontaminated BRB040 genome, which has a total size of 8.2 Mb and completeness of 99.8%, this pipeline revealed that the assembly performed with the decontaminated reads via BBduk presented better results, with completeness 0.3% lower than the reference. The genome mining of both genomes using antiSMASH 7.0 revealed the number of 24 Biosynthetic Gene Clusters (BGCs) for BBduk data as well as in the control assembly of the BRB040. In silico decontamination process allows the genome mining of BGCs despite the loss of nucleotides. These findings show that contamination can be effectively removed from a genome using readily available online tools, while preserving a dataset suitable for extracting valuable insights into the secondary metabolism of the target organism. This approach is particularly beneficial in scenarios where resequencing samples is not immediately feasible.</p>\",\"PeriodicalId\":9090,\"journal\":{\"name\":\"Brazilian Journal of Microbiology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-01-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Brazilian Journal of Microbiology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1007/s42770-024-01598-2\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Brazilian Journal of Microbiology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s42770-024-01598-2","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
Decontamination of DNA sequences from a Streptomyces genome for optimal genome mining.
Despite meticulous precautions, contamination of genomic DNA samples is not uncommon, which can significantly compromise the analysis of microorganisms' whole-genome sequencing data, thus affecting all subsequent analyses. Thanks to advancements in software and bioinformatics techniques, it is now possible to address this issue and prevent the loss of the entire dataset obtained in a contaminated whole-genome sequencing, where the DNA of another bacterium is present. In this study, it was observed that the sequencing reads from Streptomyces sp. BRB040, generated using the HiSeq System platform (Illumina Inc., San Diego, USA), were contaminated with the DNA of Bacillus licheniformis. To eliminate the contamination in Streptomyces sp. BRB040, a combination of tools available on the Galaxy platform and other web-based resources were used (MeDuSa and Blast). The contaminated reads were treated as a metagenome to isolate the genome of the contaminating organism. They were assembled using the metaSPAdes, resulting in a large scaffold of 4.187 Mb, which was identified as Bacillus licheniformis. After the identification of the contaminating organism, its genome was used as a filter to remove sequencing reads that could align using then Bowtie 2 software for this step. Once the contaminated reads were removed a new assembly was performed using the Unicycler software, yielding 117 contigs with a total size of 7.9 Mb. The completeness of this genome was assessed through BUSCO, resulting in a completeness of 95.9%. We also used an alternative tool (BBduk) to eliminate contaminated reads and the resulting assembly by Unicycler generated 85 contigs with a total size of 8.3 Mb and completeness of 99.5%. These results were better than the assembly obtained via SPAdes, which generated less complete genomes (maximum of 97.8% completeness) compared to Unicycler and which was unable to perform an adequate assembly of the data obtained from decontamination by BBduk. When compared with the uncontaminated BRB040 genome, which has a total size of 8.2 Mb and completeness of 99.8%, this pipeline revealed that the assembly performed with the decontaminated reads via BBduk presented better results, with completeness 0.3% lower than the reference. The genome mining of both genomes using antiSMASH 7.0 revealed the number of 24 Biosynthetic Gene Clusters (BGCs) for BBduk data as well as in the control assembly of the BRB040. In silico decontamination process allows the genome mining of BGCs despite the loss of nucleotides. These findings show that contamination can be effectively removed from a genome using readily available online tools, while preserving a dataset suitable for extracting valuable insights into the secondary metabolism of the target organism. This approach is particularly beneficial in scenarios where resequencing samples is not immediately feasible.
期刊介绍:
The Brazilian Journal of Microbiology is an international peer reviewed journal that covers a wide-range of research on fundamental and applied aspects of microbiology.
The journal considers for publication original research articles, short communications, reviews, and letters to the editor, that may be submitted to the following sections: Biotechnology and Industrial Microbiology, Food Microbiology, Bacterial and Fungal Pathogenesis, Clinical Microbiology, Environmental Microbiology, Veterinary Microbiology, Fungal and Bacterial Physiology, Bacterial, Fungal and Virus Molecular Biology, Education in Microbiology. For more details on each section, please check out the instructions for authors.
The journal is the official publication of the Brazilian Society of Microbiology and currently publishes 4 issues per year.