{"title":"Predicting host-pathogen interactions with machine learning algorithms: A scoping review","authors":"Rasool Sahragard , Masoud Arabfard , Ali Najafi","doi":"10.1016/j.meegid.2025.105751","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Diseases caused by pathogenic microorganisms pose a persistent global health challenge. Pathogens exploit host mechanisms through intricate molecular interactions. Understanding these host-pathogen interactions (HPIs), particularly protein-protein interactions (PPIs), is crucial for developing therapeutic strategies. While experimental approaches are essential, they are often labor-intensive and costly. Researchers have been able to predict HPIs more efficiently due to recent advances in artificial intelligence and machine learning. However, existing reviews lack a systematic evaluation of different machine learning methodologies and their effectiveness.</div></div><div><h3>Methods</h3><div>This scoping review critically examines recent studies on machine learning-based Host-Pathogen Interaction (HPI) prediction, categorizing them by host and pathogen types, machine learning algorithms, and key evaluation metrics. The methodology is based on the study beginning with a preliminary search in reputable using key phrases related to host-pathogen interactions from 2019 to 2024. This process yielded 46 relevant articles, from which 30 were selected for review after evaluating titles and abstracts.</div></div><div><h3>Results</h3><div>Our findings indicate that tree-based algorithms, particularly Random Forest and Gradient Boosting, are the most prevalent in Host-Pathogen Interaction (HPI) prediction. The filter articles were categorized by host and pathogen type and further subdivided into four subcategories based on the prediction type and machine learning algorithms: classic, tree-based, vector-based, and neural network algorithms. Convolutional and recurrent neural networks are among the deep learning models that demonstrate promising accuracy, but they require a lot of labeled data for effective training. Additionally, the analysis uncovers significant gaps in dataset standardization and model interpretability, which pose challenges to the broader applicability of these predictive models.</div></div><div><h3>Conclusion</h3><div>In this review, we emphasize the potential of machine learning in HPI prediction and highlight the important challenges that must be addressed to improve predictive accuracy. Unlike previous reviews, our study systematically compares different computational approaches, offering a roadmap for future research. The findings emphasize the importance of dataset quality, feature selection, and model transparency in advancing AI-driven pathogen research.</div></div>","PeriodicalId":54986,"journal":{"name":"Infection Genetics and Evolution","volume":"130 ","pages":"Article 105751"},"PeriodicalIF":2.6000,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infection Genetics and Evolution","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1567134825000401","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Diseases caused by pathogenic microorganisms pose a persistent global health challenge. Pathogens exploit host mechanisms through intricate molecular interactions. Understanding these host-pathogen interactions (HPIs), particularly protein-protein interactions (PPIs), is crucial for developing therapeutic strategies. While experimental approaches are essential, they are often labor-intensive and costly. Researchers have been able to predict HPIs more efficiently due to recent advances in artificial intelligence and machine learning. However, existing reviews lack a systematic evaluation of different machine learning methodologies and their effectiveness.
Methods
This scoping review critically examines recent studies on machine learning-based Host-Pathogen Interaction (HPI) prediction, categorizing them by host and pathogen types, machine learning algorithms, and key evaluation metrics. The methodology is based on the study beginning with a preliminary search in reputable using key phrases related to host-pathogen interactions from 2019 to 2024. This process yielded 46 relevant articles, from which 30 were selected for review after evaluating titles and abstracts.
Results
Our findings indicate that tree-based algorithms, particularly Random Forest and Gradient Boosting, are the most prevalent in Host-Pathogen Interaction (HPI) prediction. The filter articles were categorized by host and pathogen type and further subdivided into four subcategories based on the prediction type and machine learning algorithms: classic, tree-based, vector-based, and neural network algorithms. Convolutional and recurrent neural networks are among the deep learning models that demonstrate promising accuracy, but they require a lot of labeled data for effective training. Additionally, the analysis uncovers significant gaps in dataset standardization and model interpretability, which pose challenges to the broader applicability of these predictive models.
Conclusion
In this review, we emphasize the potential of machine learning in HPI prediction and highlight the important challenges that must be addressed to improve predictive accuracy. Unlike previous reviews, our study systematically compares different computational approaches, offering a roadmap for future research. The findings emphasize the importance of dataset quality, feature selection, and model transparency in advancing AI-driven pathogen research.
期刊介绍:
(aka Journal of Molecular Epidemiology and Evolutionary Genetics of Infectious Diseases -- MEEGID)
Infectious diseases constitute one of the main challenges to medical science in the coming century. The impressive development of molecular megatechnologies and of bioinformatics have greatly increased our knowledge of the evolution, transmission and pathogenicity of infectious diseases. Research has shown that host susceptibility to many infectious diseases has a genetic basis. Furthermore, much is now known on the molecular epidemiology, evolution and virulence of pathogenic agents, as well as their resistance to drugs, vaccines, and antibiotics. Equally, research on the genetics of disease vectors has greatly improved our understanding of their systematics, has increased our capacity to identify target populations for control or intervention, and has provided detailed information on the mechanisms of insecticide resistance.
However, the genetics and evolutionary biology of hosts, pathogens and vectors have tended to develop as three separate fields of research. This artificial compartmentalisation is of concern due to our growing appreciation of the strong co-evolutionary interactions among hosts, pathogens and vectors.
Infection, Genetics and Evolution and its companion congress [MEEGID](http://www.meegidconference.com/) (for Molecular Epidemiology and Evolutionary Genetics of Infectious Diseases) are the main forum acting for the cross-fertilization between evolutionary science and biomedical research on infectious diseases.
Infection, Genetics and Evolution is the only journal that welcomes articles dealing with the genetics and evolutionary biology of hosts, pathogens and vectors, and coevolution processes among them in relation to infection and disease manifestation. All infectious models enter the scope of the journal, including pathogens of humans, animals and plants, either parasites, fungi, bacteria, viruses or prions. The journal welcomes articles dealing with genetics, population genetics, genomics, postgenomics, gene expression, evolutionary biology, population dynamics, mathematical modeling and bioinformatics. We also provide many author benefits, such as free PDFs, a liberal copyright policy, special discounts on Elsevier publications and much more. Please click here for more information on our author services .