MOBFinder: a tool for mobilization typing of plasmid metagenomic fragments based on a language model.

IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES
Tao Feng, Shufang Wu, Hongwei Zhou, Zhencheng Fang
{"title":"MOBFinder: a tool for mobilization typing of plasmid metagenomic fragments based on a language model.","authors":"Tao Feng, Shufang Wu, Hongwei Zhou, Zhencheng Fang","doi":"10.1093/gigascience/giae047","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Mobilization typing (MOB) is a classification scheme for plasmid genomes based on their relaxase gene. The host ranges of plasmids of different MOB categories are diverse, and MOB is crucial for investigating plasmid mobilization, especially the transmission of resistance genes and virulence factors. However, MOB typing of plasmid metagenomic data is challenging due to the highly fragmented characteristics of metagenomic contigs.</p><p><strong>Results: </strong>We developed MOBFinder, an 11-class classifier, for categorizing plasmid fragments into 10 MOB types and a nonmobilizable category. We first performed MOB typing to classify complete plasmid genomes according to relaxase information and then constructed an artificial benchmark dataset of plasmid metagenomic fragments (PMFs) from those complete plasmid genomes whose MOB types are well annotated. Next, based on natural language models, we used word vectors to characterize the PMFs. Several random forest classification models were trained and integrated to predict fragments of different lengths. Evaluating the tool using the benchmark dataset, we found that MOBFinder outperforms previous tools such as MOBscan and MOB-suite, with an overall accuracy approximately 59% higher than that of MOB-suite. Moreover, the balanced accuracy, harmonic mean, and F1-score reached up to 99% for some MOB types. When applied to a cohort of patients with type 2 diabetes (T2D), MOBFinder offered insights suggesting that the MOBF type plasmid, which is widely present in Escherichia and Klebsiella, and the MOBQ type plasmid might accelerate antibiotic resistance transmission in patients with T2D.</p><p><strong>Conclusions: </strong>To the best of our knowledge, MOBFinder is the first tool for MOB typing of PMFs. The tool is freely available at https://github.com/FengTaoSMU/MOBFinder.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8000,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11299106/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giae047","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Mobilization typing (MOB) is a classification scheme for plasmid genomes based on their relaxase gene. The host ranges of plasmids of different MOB categories are diverse, and MOB is crucial for investigating plasmid mobilization, especially the transmission of resistance genes and virulence factors. However, MOB typing of plasmid metagenomic data is challenging due to the highly fragmented characteristics of metagenomic contigs.

Results: We developed MOBFinder, an 11-class classifier, for categorizing plasmid fragments into 10 MOB types and a nonmobilizable category. We first performed MOB typing to classify complete plasmid genomes according to relaxase information and then constructed an artificial benchmark dataset of plasmid metagenomic fragments (PMFs) from those complete plasmid genomes whose MOB types are well annotated. Next, based on natural language models, we used word vectors to characterize the PMFs. Several random forest classification models were trained and integrated to predict fragments of different lengths. Evaluating the tool using the benchmark dataset, we found that MOBFinder outperforms previous tools such as MOBscan and MOB-suite, with an overall accuracy approximately 59% higher than that of MOB-suite. Moreover, the balanced accuracy, harmonic mean, and F1-score reached up to 99% for some MOB types. When applied to a cohort of patients with type 2 diabetes (T2D), MOBFinder offered insights suggesting that the MOBF type plasmid, which is widely present in Escherichia and Klebsiella, and the MOBQ type plasmid might accelerate antibiotic resistance transmission in patients with T2D.

Conclusions: To the best of our knowledge, MOBFinder is the first tool for MOB typing of PMFs. The tool is freely available at https://github.com/FengTaoSMU/MOBFinder.

MOBFinder:基于语言模型的质粒元基因组片段动员分型工具。
背景:动员分型(MOB)是一种基于松弛酶基因的质粒基因组分类方法。不同 MOB 类别的质粒的宿主范围各不相同,MOB 对于研究质粒的迁移,尤其是抗性基因和毒力因子的传播至关重要。然而,由于元基因组等位基因高度碎片化的特点,对质粒元基因组数据进行MOB分型具有挑战性:我们开发了 MOBFinder(一种 11 级分类器),用于将质粒片段分为 10 种 MOB 类型和一种不可移动类型。我们首先进行了MOB分型,根据松弛酶信息对完整质粒基因组进行分类,然后从MOB类型注释良好的完整质粒基因组中构建了质粒元基因组片段(PMF)人工基准数据集。接下来,基于自然语言模型,我们使用词向量来描述 PMF。我们训练并整合了多个随机森林分类模型,以预测不同长度的片段。通过使用基准数据集对工具进行评估,我们发现 MOBFinder 优于 MOBscan 和 MOB-suite 等以前的工具,其总体准确率比 MOB-suite 高出约 59%。此外,某些 MOB 类型的平衡准确率、调和平均值和 F1 分数高达 99%。当应用于2型糖尿病(T2D)患者队列时,MOBFinder提供的见解表明,广泛存在于埃希氏菌和克雷伯氏菌中的MOBF型质粒和MOBQ型质粒可能会加速T2D患者的抗生素耐药性传播:据我们所知,MOBFinder 是首个对 PMF 进行 MOB 分型的工具。该工具可在 https://github.com/FengTaoSMU/MOBFinder 免费获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
GigaScience
GigaScience MULTIDISCIPLINARY SCIENCES-
CiteScore
15.50
自引率
1.10%
发文量
119
审稿时长
1 weeks
期刊介绍: GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信