Applying rearrangement distances to enable plasmid epidemiology with pling.

IF 4 2区 生物学 Q1 GENETICS & HEREDITY
Daria Frolova, Leandro Lima, Leah Wendy Roberts, Leonard Bohnenkämper, Roland Wittler, Jens Stoye, Zamin Iqbal
{"title":"Applying rearrangement distances to enable plasmid epidemiology with pling.","authors":"Daria Frolova, Leandro Lima, Leah Wendy Roberts, Leonard Bohnenkämper, Roland Wittler, Jens Stoye, Zamin Iqbal","doi":"10.1099/mgen.0.001300","DOIUrl":null,"url":null,"abstract":"<p><p>Plasmids are a key vector of antibiotic resistance, but the current bioinformatics toolkit is not well suited to tracking them. The rapid structural changes seen in plasmid genomes present considerable challenges to evolutionary and epidemiological analysis. Typical approaches are either low resolution (replicon typing) or use shared k-mer content to define a genetic distance. However, this distance can both overestimate plasmid relatedness by ignoring rearrangements, and underestimate by over-penalizing gene gain/loss. Therefore a model is needed which captures the key components of how plasmid genomes evolve structurally - through gene/block gain or loss, and rearrangement. A secondary requirement is to prevent promiscuous transposable elements (TEs) leading to over-clustering of unrelated plasmids. We choose the 'Double Cut and Join Indel' (DCJ-Indel) model, in which plasmids are studied at a coarse level, as a sequence of signed integers (representing genes or aligned blocks), and the distance between two plasmids is the minimum number of rearrangement events or indels needed to transform one into the other. We show how this gives much more meaningful distances between plasmids. We introduce a software workflow pling (https://github.com/iqbal-lab-org/pling), which uses the DCJ-Indel model, to calculate distances between plasmids and then cluster them. In our approach, we combine containment distances and DCJ-Indel distances to build a TE-aware plasmid network. We demonstrate superior performance and interpretability to other plasmid clustering tools on the 'Russian Doll' dataset and a hospital transmission dataset.</p>","PeriodicalId":18487,"journal":{"name":"Microbial Genomics","volume":null,"pages":null},"PeriodicalIF":4.0000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11472880/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microbial Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1099/mgen.0.001300","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Plasmids are a key vector of antibiotic resistance, but the current bioinformatics toolkit is not well suited to tracking them. The rapid structural changes seen in plasmid genomes present considerable challenges to evolutionary and epidemiological analysis. Typical approaches are either low resolution (replicon typing) or use shared k-mer content to define a genetic distance. However, this distance can both overestimate plasmid relatedness by ignoring rearrangements, and underestimate by over-penalizing gene gain/loss. Therefore a model is needed which captures the key components of how plasmid genomes evolve structurally - through gene/block gain or loss, and rearrangement. A secondary requirement is to prevent promiscuous transposable elements (TEs) leading to over-clustering of unrelated plasmids. We choose the 'Double Cut and Join Indel' (DCJ-Indel) model, in which plasmids are studied at a coarse level, as a sequence of signed integers (representing genes or aligned blocks), and the distance between two plasmids is the minimum number of rearrangement events or indels needed to transform one into the other. We show how this gives much more meaningful distances between plasmids. We introduce a software workflow pling (https://github.com/iqbal-lab-org/pling), which uses the DCJ-Indel model, to calculate distances between plasmids and then cluster them. In our approach, we combine containment distances and DCJ-Indel distances to build a TE-aware plasmid network. We demonstrate superior performance and interpretability to other plasmid clustering tools on the 'Russian Doll' dataset and a hospital transmission dataset.

利用重排距离实现质粒流行病学。
质粒是抗生素耐药性的关键载体,但目前的生物信息学工具包并不适合跟踪质粒。质粒基因组结构变化迅速,给进化和流行病学分析带来了巨大挑战。典型的方法要么是低分辨率(复制子分型),要么是使用共享的 k-mer 内容来定义遗传距离。然而,这种距离既可能因忽略重排而高估质粒的亲缘关系,也可能因过度贬低基因增减而低估亲缘关系。因此,我们需要一个模型来捕捉质粒基因组结构进化的关键要素--通过基因/区块的增减和重排。次要要求是防止杂乱的转座元件(TE)导致不相关质粒过度聚集。我们选择了 "双切和连接吲哚"(DCJ-Indel)模型,在该模型中,质粒作为有符号整数序列(代表基因或排列块)被粗略研究,两个质粒之间的距离是将一个质粒转化为另一个质粒所需的最小重排事件或吲哚数量。我们展示了如何通过这种方法获得更有意义的质粒间距离。我们介绍了一种软件工作流程 pling (https://github.com/iqbal-lab-org/pling),它使用 DCJ-Indel 模型来计算质粒之间的距离,然后对它们进行聚类。在我们的方法中,我们结合了包含距离和 DCJ-Indel 距离来构建一个 TE 感知质粒网络。我们在 "俄罗斯娃娃 "数据集和医院传播数据集上展示了优于其他质粒聚类工具的性能和可解释性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Microbial Genomics
Microbial Genomics Medicine-Epidemiology
CiteScore
6.60
自引率
2.60%
发文量
153
审稿时长
12 weeks
期刊介绍: Microbial Genomics (MGen) is a fully open access, mandatory open data and peer-reviewed journal publishing high-profile original research on archaea, bacteria, microbial eukaryotes and viruses.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信