DeepPBI-KG: a deep learning method for the prediction of phage-bacteria interactions based on key genes.

IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Tongqing Wei, Chenqi Lu, Hanxiao Du, Qianru Yang, Xin Qi, Yankun Liu, Yi Zhang, Chen Chen, Yutong Li, Yuanhao Tang, Wen-Hong Zhang, Xu Tao, Ning Jiang
{"title":"DeepPBI-KG: a deep learning method for the prediction of phage-bacteria interactions based on key genes.","authors":"Tongqing Wei, Chenqi Lu, Hanxiao Du, Qianru Yang, Xin Qi, Yankun Liu, Yi Zhang, Chen Chen, Yutong Li, Yuanhao Tang, Wen-Hong Zhang, Xu Tao, Ning Jiang","doi":"10.1093/bib/bbae484","DOIUrl":null,"url":null,"abstract":"<p><p>Phages, the natural predators of bacteria, were discovered more than 100 years ago. However, increasing antimicrobial resistance rates have revitalized phage research. Methods that are more time-consuming and efficient than wet-laboratory experiments are needed to help screen phages quickly for therapeutic use. Traditional computational methods usually ignore the fact that phage-bacteria interactions are achieved by key genes and proteins. Methods for intraspecific prediction are rare since almost all existing methods consider only interactions at the species and genus levels. Moreover, most strains in existing databases contain only partial genome information because whole-genome information for species is difficult to obtain. Here, we propose a new approach for interaction prediction by constructing new features from key genes and proteins via the application of K-means sampling to select high-quality negative samples for prediction. Finally, we develop DeepPBI-KG, a corresponding prediction tool based on feature selection and a deep neural network. The results show that the average area under the curve for prediction reached 0.93 for each strain, and the overall AUC and area under the precision-recall curve reached 0.89 and 0.92, respectively, on the independent test set; these values are greater than those of other existing prediction tools. The forward and reverse validation results indicate that key genes and key proteins regulate and influence the interaction, which supports the reliability of the model. In addition, intraspecific prediction experiments based on Klebsiella pneumoniae data demonstrate the potential applicability of DeepPBI-KG for intraspecific prediction. In summary, the feature engineering and interaction prediction approaches proposed in this study can effectively improve the robustness and stability of interaction prediction, can achieve high generalizability, and may provide new directions and insights for rapid phage screening for therapy.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8000,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11440089/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbae484","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Phages, the natural predators of bacteria, were discovered more than 100 years ago. However, increasing antimicrobial resistance rates have revitalized phage research. Methods that are more time-consuming and efficient than wet-laboratory experiments are needed to help screen phages quickly for therapeutic use. Traditional computational methods usually ignore the fact that phage-bacteria interactions are achieved by key genes and proteins. Methods for intraspecific prediction are rare since almost all existing methods consider only interactions at the species and genus levels. Moreover, most strains in existing databases contain only partial genome information because whole-genome information for species is difficult to obtain. Here, we propose a new approach for interaction prediction by constructing new features from key genes and proteins via the application of K-means sampling to select high-quality negative samples for prediction. Finally, we develop DeepPBI-KG, a corresponding prediction tool based on feature selection and a deep neural network. The results show that the average area under the curve for prediction reached 0.93 for each strain, and the overall AUC and area under the precision-recall curve reached 0.89 and 0.92, respectively, on the independent test set; these values are greater than those of other existing prediction tools. The forward and reverse validation results indicate that key genes and key proteins regulate and influence the interaction, which supports the reliability of the model. In addition, intraspecific prediction experiments based on Klebsiella pneumoniae data demonstrate the potential applicability of DeepPBI-KG for intraspecific prediction. In summary, the feature engineering and interaction prediction approaches proposed in this study can effectively improve the robustness and stability of interaction prediction, can achieve high generalizability, and may provide new directions and insights for rapid phage screening for therapy.

DeepPBI-KG:基于关键基因预测噬菌体-细菌相互作用的深度学习方法。
噬菌体是细菌的天敌,早在 100 多年前就已被发现。然而,抗菌药耐药性的增加为噬菌体研究注入了新的活力。我们需要比湿实验室实验更耗时、更高效的方法来帮助快速筛选噬菌体,以便用于治疗。传统的计算方法通常忽略了噬菌体与细菌之间的相互作用是由关键基因和蛋白质实现的这一事实。由于几乎所有现有方法都只考虑种和属一级的相互作用,因此用于种内预测的方法并不多见。此外,现有数据库中的大多数菌株只包含部分基因组信息,因为物种的全基因组信息很难获得。在此,我们提出了一种新的相互作用预测方法,通过应用 K-means 抽样从关键基因和蛋白质中构建新特征,从而选择高质量的阴性样本进行预测。最后,我们开发了基于特征选择和深度神经网络的相应预测工具 DeepPBI-KG。结果表明,在独立测试集上,每个菌株的平均预测曲线下面积达到了 0.93,总体 AUC 和精度-召回曲线下面积分别达到了 0.89 和 0.92;这些值都高于其他现有预测工具。正向和反向验证结果表明,关键基因和关键蛋白调控和影响着相互作用,这支持了模型的可靠性。此外,基于肺炎克雷伯菌数据的种内预测实验证明了 DeepPBI-KG 在种内预测方面的潜在适用性。总之,本研究提出的特征工程和相互作用预测方法能有效提高相互作用预测的鲁棒性和稳定性,并能实现较高的普适性,可为噬菌体快速筛选治疗提供新的方向和见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Briefings in bioinformatics
Briefings in bioinformatics 生物-生化研究方法
CiteScore
13.20
自引率
13.70%
发文量
549
审稿时长
6 months
期刊介绍: Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信