Stephanie Chau, Carlos Rojas, Jorjeta G. Jetcheva, Mary Markart, Sudha Vijayakumar, Sophia Yuan, Vincent Stowbunenko, Amanda N. Shelton, William B. Andreopoulos
{"title":"比较核糖体组装和机器学习工具在微生物鉴定中的不同特性","authors":"Stephanie Chau, Carlos Rojas, Jorjeta G. Jetcheva, Mary Markart, Sudha Vijayakumar, Sophia Yuan, Vincent Stowbunenko, Amanda N. Shelton, William B. Andreopoulos","doi":"10.2174/0115748936299440240709070105","DOIUrl":null,"url":null,"abstract":"Background: Genome assembly tools are used to reconstruct genomic sequences from raw sequencing data, which are then used for identifying the organisms present in a metagenomic sample. Methodology: More recently, machine learning approaches have been applied to a variety of bioinformatics problems, and in this paper, we explore their use for organism identification. We start by evaluating several commonly used metagenomic assembly tools, including PhyloFlash, MEGAHIT, MetaSPAdes, Kraken2, Mothur, UniCycler, and PathRacer, and compare them against state-of-theart deep learning-based machine learning classification approaches represented by DNABERT and DeLUCS, in the context of two synthetic mock community datasets. Result: Our analysis focuses on determining whether ensembling metagenome assembly tools with machine learning tools have the potential to improve identification performance relative to using the tools individually. Conclusion: We find that this is indeed the case, and analyze the level of effectiveness of potential tool ensembling for organisms with different characteristics (based on factors such as repetitiveness, genome size, and GC content).","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"56 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison between Ribosomal Assembly and Machine Learning Tools for Microbial Identification of Organisms with Different Characteristics\",\"authors\":\"Stephanie Chau, Carlos Rojas, Jorjeta G. Jetcheva, Mary Markart, Sudha Vijayakumar, Sophia Yuan, Vincent Stowbunenko, Amanda N. Shelton, William B. Andreopoulos\",\"doi\":\"10.2174/0115748936299440240709070105\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Genome assembly tools are used to reconstruct genomic sequences from raw sequencing data, which are then used for identifying the organisms present in a metagenomic sample. Methodology: More recently, machine learning approaches have been applied to a variety of bioinformatics problems, and in this paper, we explore their use for organism identification. We start by evaluating several commonly used metagenomic assembly tools, including PhyloFlash, MEGAHIT, MetaSPAdes, Kraken2, Mothur, UniCycler, and PathRacer, and compare them against state-of-theart deep learning-based machine learning classification approaches represented by DNABERT and DeLUCS, in the context of two synthetic mock community datasets. Result: Our analysis focuses on determining whether ensembling metagenome assembly tools with machine learning tools have the potential to improve identification performance relative to using the tools individually. Conclusion: We find that this is indeed the case, and analyze the level of effectiveness of potential tool ensembling for organisms with different characteristics (based on factors such as repetitiveness, genome size, and GC content).\",\"PeriodicalId\":10801,\"journal\":{\"name\":\"Current Bioinformatics\",\"volume\":\"56 1\",\"pages\":\"\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-07-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Current Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.2174/0115748936299440240709070105\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/0115748936299440240709070105","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
Comparison between Ribosomal Assembly and Machine Learning Tools for Microbial Identification of Organisms with Different Characteristics
Background: Genome assembly tools are used to reconstruct genomic sequences from raw sequencing data, which are then used for identifying the organisms present in a metagenomic sample. Methodology: More recently, machine learning approaches have been applied to a variety of bioinformatics problems, and in this paper, we explore their use for organism identification. We start by evaluating several commonly used metagenomic assembly tools, including PhyloFlash, MEGAHIT, MetaSPAdes, Kraken2, Mothur, UniCycler, and PathRacer, and compare them against state-of-theart deep learning-based machine learning classification approaches represented by DNABERT and DeLUCS, in the context of two synthetic mock community datasets. Result: Our analysis focuses on determining whether ensembling metagenome assembly tools with machine learning tools have the potential to improve identification performance relative to using the tools individually. Conclusion: We find that this is indeed the case, and analyze the level of effectiveness of potential tool ensembling for organisms with different characteristics (based on factors such as repetitiveness, genome size, and GC content).
期刊介绍:
Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth/mini-reviews, research papers and guest edited thematic issues written by leaders in the field, covering a wide range of the integration of biology with computer and information science.
The journal focuses on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.