Comparison between Ribosomal Assembly and Machine Learning Tools for Microbial Identification of Organisms with Different Characteristics

IF 2.4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS
Stephanie Chau, Carlos Rojas, Jorjeta G. Jetcheva, Mary Markart, Sudha Vijayakumar, Sophia Yuan, Vincent Stowbunenko, Amanda N. Shelton, William B. Andreopoulos
{"title":"Comparison between Ribosomal Assembly and Machine Learning Tools for Microbial Identification of Organisms with Different Characteristics","authors":"Stephanie Chau, Carlos Rojas, Jorjeta G. Jetcheva, Mary Markart, Sudha Vijayakumar, Sophia Yuan, Vincent Stowbunenko, Amanda N. Shelton, William B. Andreopoulos","doi":"10.2174/0115748936299440240709070105","DOIUrl":null,"url":null,"abstract":"Background: Genome assembly tools are used to reconstruct genomic sequences from raw sequencing data, which are then used for identifying the organisms present in a metagenomic sample. Methodology: More recently, machine learning approaches have been applied to a variety of bioinformatics problems, and in this paper, we explore their use for organism identification. We start by evaluating several commonly used metagenomic assembly tools, including PhyloFlash, MEGAHIT, MetaSPAdes, Kraken2, Mothur, UniCycler, and PathRacer, and compare them against state-of-theart deep learning-based machine learning classification approaches represented by DNABERT and DeLUCS, in the context of two synthetic mock community datasets. Result: Our analysis focuses on determining whether ensembling metagenome assembly tools with machine learning tools have the potential to improve identification performance relative to using the tools individually. Conclusion: We find that this is indeed the case, and analyze the level of effectiveness of potential tool ensembling for organisms with different characteristics (based on factors such as repetitiveness, genome size, and GC content).","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/0115748936299440240709070105","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Genome assembly tools are used to reconstruct genomic sequences from raw sequencing data, which are then used for identifying the organisms present in a metagenomic sample. Methodology: More recently, machine learning approaches have been applied to a variety of bioinformatics problems, and in this paper, we explore their use for organism identification. We start by evaluating several commonly used metagenomic assembly tools, including PhyloFlash, MEGAHIT, MetaSPAdes, Kraken2, Mothur, UniCycler, and PathRacer, and compare them against state-of-theart deep learning-based machine learning classification approaches represented by DNABERT and DeLUCS, in the context of two synthetic mock community datasets. Result: Our analysis focuses on determining whether ensembling metagenome assembly tools with machine learning tools have the potential to improve identification performance relative to using the tools individually. Conclusion: We find that this is indeed the case, and analyze the level of effectiveness of potential tool ensembling for organisms with different characteristics (based on factors such as repetitiveness, genome size, and GC content).
比较核糖体组装和机器学习工具在微生物鉴定中的不同特性
背景:基因组组装工具用于从原始测序数据中重建基因组序列,然后用于识别元基因组样本中存在的生物。方法:最近,机器学习方法被应用于各种生物信息学问题,在本文中,我们探讨了机器学习方法在生物识别中的应用。我们首先评估了几种常用的元基因组组装工具,包括 PhyloFlash、MEGAHIT、MetaSPAdes、Kraken2、Mothur、UniCycler 和 PathRacer,并以两个合成模拟群落数据集为背景,将它们与以 DNABERT 和 DeLUCS 为代表的基于深度学习的先进机器学习分类方法进行比较。结果我们的分析重点是确定将元基因组组装工具与机器学习工具组装在一起是否有可能比单独使用这些工具提高识别性能。结论我们发现情况确实如此,并分析了针对具有不同特征(基于重复性、基因组大小和 GC 含量等因素)的生物的潜在工具组合的有效性水平。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Current Bioinformatics
Current Bioinformatics 生物-生化研究方法
CiteScore
6.60
自引率
2.50%
发文量
77
审稿时长
>12 weeks
期刊介绍: Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth/mini-reviews, research papers and guest edited thematic issues written by leaders in the field, covering a wide range of the integration of biology with computer and information science. The journal focuses on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信