Peng Gao, Long Xu, Yuan Bai, Qiuzhen Lin, Junkai Ji, Lijia Ma
{"title":"基于多源蛋白语言模型和挤压-激发注意机制的深度神经网络噬菌体宿主预测。","authors":"Peng Gao, Long Xu, Yuan Bai, Qiuzhen Lin, Junkai Ji, Lijia Ma","doi":"10.1109/JBHI.2025.3582652","DOIUrl":null,"url":null,"abstract":"<p><p>Phage therapy (PT) has become a promising alternative for treating infections with the increase of antimicrobial resistance. PT utilizes phages to bind to specific receptors on bacterial surfaces via receptor-binding proteins (RBPs), enabling precise destruction of targeted hosts. In PT, a key issue is the phage host prediction (PHP), which tries to match therapeutic phages to pathogenic hosts. However, traditional PHP methods are often hindered by the time-consuming and expensive wet-lab experiments, while recent computational methods neglect the evolutionary diversity and local feature patterns of RBPs. In this article, we propose a novel deep neural network (called PHPRBP) for PHP based on phage RBPs. In PHPRBP, we first utilize pre-trained protein language models (i.e., ESM2 and ProtT5) to learn the multi-source embedding representations from these RBPs, revealing diverse and complementary features. Then, we employ an adaptive synthetic technique to augment minority class samples, addressing the data scarcity issue. Subsequently, we design a deep neural network architecture, which uses a convolutional neural network to capture local sequence features, and applies a squeeze-and-excitation attention mechanism to enhance the contribution of important features. Finally, a fully connected network is used for host prediction. Experimental results show that PHPRBP outperforms the state-of-the-arts in host prediction at both genus and species levels. The data and code of PHPRBP are available at https://github.com/a1678019300/PHPRBP.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Phage Host Prediction Using Deep Neural Network with Multi-source Protein Language Models and Squeeze-and-Excitation Attention Mechanism.\",\"authors\":\"Peng Gao, Long Xu, Yuan Bai, Qiuzhen Lin, Junkai Ji, Lijia Ma\",\"doi\":\"10.1109/JBHI.2025.3582652\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Phage therapy (PT) has become a promising alternative for treating infections with the increase of antimicrobial resistance. PT utilizes phages to bind to specific receptors on bacterial surfaces via receptor-binding proteins (RBPs), enabling precise destruction of targeted hosts. In PT, a key issue is the phage host prediction (PHP), which tries to match therapeutic phages to pathogenic hosts. However, traditional PHP methods are often hindered by the time-consuming and expensive wet-lab experiments, while recent computational methods neglect the evolutionary diversity and local feature patterns of RBPs. In this article, we propose a novel deep neural network (called PHPRBP) for PHP based on phage RBPs. In PHPRBP, we first utilize pre-trained protein language models (i.e., ESM2 and ProtT5) to learn the multi-source embedding representations from these RBPs, revealing diverse and complementary features. Then, we employ an adaptive synthetic technique to augment minority class samples, addressing the data scarcity issue. Subsequently, we design a deep neural network architecture, which uses a convolutional neural network to capture local sequence features, and applies a squeeze-and-excitation attention mechanism to enhance the contribution of important features. Finally, a fully connected network is used for host prediction. Experimental results show that PHPRBP outperforms the state-of-the-arts in host prediction at both genus and species levels. The data and code of PHPRBP are available at https://github.com/a1678019300/PHPRBP.</p>\",\"PeriodicalId\":13073,\"journal\":{\"name\":\"IEEE Journal of Biomedical and Health Informatics\",\"volume\":\"PP \",\"pages\":\"\"},\"PeriodicalIF\":6.7000,\"publicationDate\":\"2025-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal of Biomedical and Health Informatics\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1109/JBHI.2025.3582652\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Biomedical and Health Informatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/JBHI.2025.3582652","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Phage Host Prediction Using Deep Neural Network with Multi-source Protein Language Models and Squeeze-and-Excitation Attention Mechanism.
Phage therapy (PT) has become a promising alternative for treating infections with the increase of antimicrobial resistance. PT utilizes phages to bind to specific receptors on bacterial surfaces via receptor-binding proteins (RBPs), enabling precise destruction of targeted hosts. In PT, a key issue is the phage host prediction (PHP), which tries to match therapeutic phages to pathogenic hosts. However, traditional PHP methods are often hindered by the time-consuming and expensive wet-lab experiments, while recent computational methods neglect the evolutionary diversity and local feature patterns of RBPs. In this article, we propose a novel deep neural network (called PHPRBP) for PHP based on phage RBPs. In PHPRBP, we first utilize pre-trained protein language models (i.e., ESM2 and ProtT5) to learn the multi-source embedding representations from these RBPs, revealing diverse and complementary features. Then, we employ an adaptive synthetic technique to augment minority class samples, addressing the data scarcity issue. Subsequently, we design a deep neural network architecture, which uses a convolutional neural network to capture local sequence features, and applies a squeeze-and-excitation attention mechanism to enhance the contribution of important features. Finally, a fully connected network is used for host prediction. Experimental results show that PHPRBP outperforms the state-of-the-arts in host prediction at both genus and species levels. The data and code of PHPRBP are available at https://github.com/a1678019300/PHPRBP.
期刊介绍:
IEEE Journal of Biomedical and Health Informatics publishes original papers presenting recent advances where information and communication technologies intersect with health, healthcare, life sciences, and biomedicine. Topics include acquisition, transmission, storage, retrieval, management, and analysis of biomedical and health information. The journal covers applications of information technologies in healthcare, patient monitoring, preventive care, early disease diagnosis, therapy discovery, and personalized treatment protocols. It explores electronic medical and health records, clinical information systems, decision support systems, medical and biological imaging informatics, wearable systems, body area/sensor networks, and more. Integration-related topics like interoperability, evidence-based medicine, and secure patient data are also addressed.