类似nlp的深度学习帮助鉴定和验证不同细菌的硫代亚硫酸盐耐受簇。

IF 3.1 2区 生物学 Q2 MICROBIOLOGY
mSphere Pub Date : 2025-07-29 Epub Date: 2025-06-17 DOI:10.1128/msphere.00023-25
Brendon K Myers, Anuj Lamichhane, Brian H Kvitko, Bhabesh Dutta
{"title":"类似nlp的深度学习帮助鉴定和验证不同细菌的硫代亚硫酸盐耐受簇。","authors":"Brendon K Myers, Anuj Lamichhane, Brian H Kvitko, Bhabesh Dutta","doi":"10.1128/msphere.00023-25","DOIUrl":null,"url":null,"abstract":"<p><p>Allicin tolerance (<i>alt</i>) clusters in phytopathogenic bacteria, which provide resistance to thiosulfinates like allicin, are challenging to find using conventional approaches due to their varied architecture and the paradox of being vertically maintained within genera despite likely being horizontally transferred. This results in significant sequential diversity that further complicates their identification. Natural language processing (NLP), like techniques such as those used in DeepBGC, offers a promising solution by treating gene clusters like a language, allowing for identifying and collecting gene clusters based on patterns and relationships within the sequences. We curated and validated <i>alt</i>-like clusters in <i>Pantoea ananatis</i> 97-1R, <i>Burkholderia gladioli</i> pv. <i>gladioli</i> FDAARGOS 389, and <i>Pseudomonas syringae</i> pv. tomato DC3000. Leveraging sequences from the RefSeq bacterial database, we conducted comparative analyses of gene synteny, gene/protein sequences, protein structures, and predicted protein interactions. This approach enabled the discovery of several novel <i>alt</i>-like clusters previously undetectable by other methods, which were further validated experimentally. Our work highlights the effectiveness of NLP-like techniques for identifying underrepresented gene clusters and expands our understanding of the diversity and utility of <i>alt</i>-like clusters in diverse bacterial genera. This work demonstrates the potential of these techniques to simplify the identification process and enhance the applicability of biological data in real-world scenarios.IMPORTANCEThiosulfinates, like allicin, are potent antifeedants and antimicrobials produced by <i>Allium</i> species and pose a challenge for phytopathogenic bacteria. Phytopathogenic bacteria have been shown to utilize an allicin tolerance (<i>alt</i>) gene cluster to circumvent this host response, leading to economically significant yield losses. Due to the complexity of mining these clusters, we applied techniques akin to natural language processing to analyze Pfam domains and gene proximity. This approach led to the identification of novel <i>alt</i>-like gene clusters, showcasing the potential of artificial intelligence to reveal elusive and underrepresented genetic clusters and enhance our understanding of their diversity and role across various bacterial genera.</p>","PeriodicalId":19052,"journal":{"name":"mSphere","volume":" ","pages":"e0002325"},"PeriodicalIF":3.1000,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12306174/pdf/","citationCount":"0","resultStr":"{\"title\":\"NLP-like deep learning aided in identification and validation of thiosulfinate tolerance clusters in diverse bacteria.\",\"authors\":\"Brendon K Myers, Anuj Lamichhane, Brian H Kvitko, Bhabesh Dutta\",\"doi\":\"10.1128/msphere.00023-25\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Allicin tolerance (<i>alt</i>) clusters in phytopathogenic bacteria, which provide resistance to thiosulfinates like allicin, are challenging to find using conventional approaches due to their varied architecture and the paradox of being vertically maintained within genera despite likely being horizontally transferred. This results in significant sequential diversity that further complicates their identification. Natural language processing (NLP), like techniques such as those used in DeepBGC, offers a promising solution by treating gene clusters like a language, allowing for identifying and collecting gene clusters based on patterns and relationships within the sequences. We curated and validated <i>alt</i>-like clusters in <i>Pantoea ananatis</i> 97-1R, <i>Burkholderia gladioli</i> pv. <i>gladioli</i> FDAARGOS 389, and <i>Pseudomonas syringae</i> pv. tomato DC3000. Leveraging sequences from the RefSeq bacterial database, we conducted comparative analyses of gene synteny, gene/protein sequences, protein structures, and predicted protein interactions. This approach enabled the discovery of several novel <i>alt</i>-like clusters previously undetectable by other methods, which were further validated experimentally. Our work highlights the effectiveness of NLP-like techniques for identifying underrepresented gene clusters and expands our understanding of the diversity and utility of <i>alt</i>-like clusters in diverse bacterial genera. This work demonstrates the potential of these techniques to simplify the identification process and enhance the applicability of biological data in real-world scenarios.IMPORTANCEThiosulfinates, like allicin, are potent antifeedants and antimicrobials produced by <i>Allium</i> species and pose a challenge for phytopathogenic bacteria. Phytopathogenic bacteria have been shown to utilize an allicin tolerance (<i>alt</i>) gene cluster to circumvent this host response, leading to economically significant yield losses. Due to the complexity of mining these clusters, we applied techniques akin to natural language processing to analyze Pfam domains and gene proximity. This approach led to the identification of novel <i>alt</i>-like gene clusters, showcasing the potential of artificial intelligence to reveal elusive and underrepresented genetic clusters and enhance our understanding of their diversity and role across various bacterial genera.</p>\",\"PeriodicalId\":19052,\"journal\":{\"name\":\"mSphere\",\"volume\":\" \",\"pages\":\"e0002325\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-07-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12306174/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"mSphere\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1128/msphere.00023-25\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/6/17 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"mSphere","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1128/msphere.00023-25","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/17 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

植物致病菌中的大蒜素耐受性(alt)簇提供对大蒜素等硫代亚硫酸盐的抗性,由于其不同的结构以及尽管可能水平转移但在属内垂直维持的悖论,使用传统方法很难找到。这导致了显著的序列多样性,进一步使它们的识别复杂化。自然语言处理(NLP),就像DeepBGC中使用的技术一样,提供了一个很有前途的解决方案,它将基因簇视为一种语言,允许基于序列中的模式和关系来识别和收集基因簇。我们筛选并验证了Pantoea ananatis 97-1R, Burkholderia gladioli pv的alt-like团簇。剑兰FDAARGOS 389,丁香假单胞菌pv。番茄DC3000。利用RefSeq细菌数据库中的序列,我们进行了基因合性、基因/蛋白质序列、蛋白质结构的比较分析,并预测了蛋白质相互作用。这种方法能够发现一些以前用其他方法无法检测到的新的alt-like团簇,这些方法得到了进一步的实验验证。我们的工作强调了nlp样技术在识别代表性不足的基因簇方面的有效性,并扩展了我们对不同细菌属中alt样簇的多样性和效用的理解。这项工作证明了这些技术在简化识别过程和增强生物数据在现实世界中的适用性方面的潜力。硫代亚硫酸盐与大蒜素一样,是由葱属植物产生的有效的抗饲料和抗菌剂,对植物致病菌构成了挑战。植物致病菌利用大蒜素耐受性(alt)基因簇来规避宿主的这种反应,导致经济上显著的产量损失。由于挖掘这些簇的复杂性,我们应用了类似于自然语言处理的技术来分析Pfam结构域和基因接近性。这种方法鉴定出了新的alt-like基因簇,展示了人工智能在揭示难以捉摸和代表性不足的基因簇方面的潜力,并增强了我们对它们在各种细菌属中的多样性和作用的理解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
NLP-like deep learning aided in identification and validation of thiosulfinate tolerance clusters in diverse bacteria.

Allicin tolerance (alt) clusters in phytopathogenic bacteria, which provide resistance to thiosulfinates like allicin, are challenging to find using conventional approaches due to their varied architecture and the paradox of being vertically maintained within genera despite likely being horizontally transferred. This results in significant sequential diversity that further complicates their identification. Natural language processing (NLP), like techniques such as those used in DeepBGC, offers a promising solution by treating gene clusters like a language, allowing for identifying and collecting gene clusters based on patterns and relationships within the sequences. We curated and validated alt-like clusters in Pantoea ananatis 97-1R, Burkholderia gladioli pv. gladioli FDAARGOS 389, and Pseudomonas syringae pv. tomato DC3000. Leveraging sequences from the RefSeq bacterial database, we conducted comparative analyses of gene synteny, gene/protein sequences, protein structures, and predicted protein interactions. This approach enabled the discovery of several novel alt-like clusters previously undetectable by other methods, which were further validated experimentally. Our work highlights the effectiveness of NLP-like techniques for identifying underrepresented gene clusters and expands our understanding of the diversity and utility of alt-like clusters in diverse bacterial genera. This work demonstrates the potential of these techniques to simplify the identification process and enhance the applicability of biological data in real-world scenarios.IMPORTANCEThiosulfinates, like allicin, are potent antifeedants and antimicrobials produced by Allium species and pose a challenge for phytopathogenic bacteria. Phytopathogenic bacteria have been shown to utilize an allicin tolerance (alt) gene cluster to circumvent this host response, leading to economically significant yield losses. Due to the complexity of mining these clusters, we applied techniques akin to natural language processing to analyze Pfam domains and gene proximity. This approach led to the identification of novel alt-like gene clusters, showcasing the potential of artificial intelligence to reveal elusive and underrepresented genetic clusters and enhance our understanding of their diversity and role across various bacterial genera.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
mSphere
mSphere Immunology and Microbiology-Microbiology
CiteScore
8.50
自引率
2.10%
发文量
192
审稿时长
11 weeks
期刊介绍: mSphere™ is a multi-disciplinary open-access journal that will focus on rapid publication of fundamental contributions to our understanding of microbiology. Its scope will reflect the immense range of fields within the microbial sciences, creating new opportunities for researchers to share findings that are transforming our understanding of human health and disease, ecosystems, neuroscience, agriculture, energy production, climate change, evolution, biogeochemical cycling, and food and drug production. Submissions will be encouraged of all high-quality work that makes fundamental contributions to our understanding of microbiology. mSphere™ will provide streamlined decisions, while carrying on ASM''s tradition for rigorous peer review.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信