预测宿主-病原体相互作用与机器学习算法:范围审查

IF 2.6 4区 医学 Q3 INFECTIOUS DISEASES
Rasool Sahragard , Masoud Arabfard , Ali Najafi
{"title":"预测宿主-病原体相互作用与机器学习算法:范围审查","authors":"Rasool Sahragard ,&nbsp;Masoud Arabfard ,&nbsp;Ali Najafi","doi":"10.1016/j.meegid.2025.105751","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Diseases caused by pathogenic microorganisms pose a persistent global health challenge. Pathogens exploit host mechanisms through intricate molecular interactions. Understanding these host-pathogen interactions (HPIs), particularly protein-protein interactions (PPIs), is crucial for developing therapeutic strategies. While experimental approaches are essential, they are often labor-intensive and costly. Researchers have been able to predict HPIs more efficiently due to recent advances in artificial intelligence and machine learning. However, existing reviews lack a systematic evaluation of different machine learning methodologies and their effectiveness.</div></div><div><h3>Methods</h3><div>This scoping review critically examines recent studies on machine learning-based Host-Pathogen Interaction (HPI) prediction, categorizing them by host and pathogen types, machine learning algorithms, and key evaluation metrics. The methodology is based on the study beginning with a preliminary search in reputable using key phrases related to host-pathogen interactions from 2019 to 2024. This process yielded 46 relevant articles, from which 30 were selected for review after evaluating titles and abstracts.</div></div><div><h3>Results</h3><div>Our findings indicate that tree-based algorithms, particularly Random Forest and Gradient Boosting, are the most prevalent in Host-Pathogen Interaction (HPI) prediction. The filter articles were categorized by host and pathogen type and further subdivided into four subcategories based on the prediction type and machine learning algorithms: classic, tree-based, vector-based, and neural network algorithms. Convolutional and recurrent neural networks are among the deep learning models that demonstrate promising accuracy, but they require a lot of labeled data for effective training. Additionally, the analysis uncovers significant gaps in dataset standardization and model interpretability, which pose challenges to the broader applicability of these predictive models.</div></div><div><h3>Conclusion</h3><div>In this review, we emphasize the potential of machine learning in HPI prediction and highlight the important challenges that must be addressed to improve predictive accuracy. Unlike previous reviews, our study systematically compares different computational approaches, offering a roadmap for future research. The findings emphasize the importance of dataset quality, feature selection, and model transparency in advancing AI-driven pathogen research.</div></div>","PeriodicalId":54986,"journal":{"name":"Infection Genetics and Evolution","volume":"130 ","pages":"Article 105751"},"PeriodicalIF":2.6000,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Predicting host-pathogen interactions with machine learning algorithms: A scoping review\",\"authors\":\"Rasool Sahragard ,&nbsp;Masoud Arabfard ,&nbsp;Ali Najafi\",\"doi\":\"10.1016/j.meegid.2025.105751\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Diseases caused by pathogenic microorganisms pose a persistent global health challenge. Pathogens exploit host mechanisms through intricate molecular interactions. Understanding these host-pathogen interactions (HPIs), particularly protein-protein interactions (PPIs), is crucial for developing therapeutic strategies. While experimental approaches are essential, they are often labor-intensive and costly. Researchers have been able to predict HPIs more efficiently due to recent advances in artificial intelligence and machine learning. However, existing reviews lack a systematic evaluation of different machine learning methodologies and their effectiveness.</div></div><div><h3>Methods</h3><div>This scoping review critically examines recent studies on machine learning-based Host-Pathogen Interaction (HPI) prediction, categorizing them by host and pathogen types, machine learning algorithms, and key evaluation metrics. The methodology is based on the study beginning with a preliminary search in reputable using key phrases related to host-pathogen interactions from 2019 to 2024. This process yielded 46 relevant articles, from which 30 were selected for review after evaluating titles and abstracts.</div></div><div><h3>Results</h3><div>Our findings indicate that tree-based algorithms, particularly Random Forest and Gradient Boosting, are the most prevalent in Host-Pathogen Interaction (HPI) prediction. The filter articles were categorized by host and pathogen type and further subdivided into four subcategories based on the prediction type and machine learning algorithms: classic, tree-based, vector-based, and neural network algorithms. Convolutional and recurrent neural networks are among the deep learning models that demonstrate promising accuracy, but they require a lot of labeled data for effective training. Additionally, the analysis uncovers significant gaps in dataset standardization and model interpretability, which pose challenges to the broader applicability of these predictive models.</div></div><div><h3>Conclusion</h3><div>In this review, we emphasize the potential of machine learning in HPI prediction and highlight the important challenges that must be addressed to improve predictive accuracy. Unlike previous reviews, our study systematically compares different computational approaches, offering a roadmap for future research. The findings emphasize the importance of dataset quality, feature selection, and model transparency in advancing AI-driven pathogen research.</div></div>\",\"PeriodicalId\":54986,\"journal\":{\"name\":\"Infection Genetics and Evolution\",\"volume\":\"130 \",\"pages\":\"Article 105751\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Infection Genetics and Evolution\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1567134825000401\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"INFECTIOUS DISEASES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infection Genetics and Evolution","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1567134825000401","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}
引用次数: 0

摘要

病原微生物引起的疾病对全球健康构成了持续的挑战。病原体通过复杂的分子相互作用利用宿主机制。了解这些宿主-病原体相互作用(hpi),特别是蛋白质-蛋白质相互作用(ppi),对于制定治疗策略至关重要。虽然实验方法是必不可少的,但它们往往是劳动密集型和昂贵的。由于人工智能和机器学习的最新进展,研究人员能够更有效地预测hpi。然而,现有的评论缺乏对不同机器学习方法及其有效性的系统评估。方法:本文综述了基于机器学习的宿主-病原体相互作用(HPI)预测的最新研究,并根据宿主和病原体类型、机器学习算法和关键评估指标对它们进行了分类。该方法基于一项研究,该研究首先使用2019年至2024年与宿主-病原体相互作用相关的关键短语进行初步搜索。这一过程产生了46篇相关文章,在对标题和摘要进行评估后,从中选出30篇进行审查。结果我们的研究结果表明,基于树的算法,特别是随机森林和梯度增强,在宿主-病原体相互作用(HPI)预测中最为普遍。根据宿主和病原体类型对过滤文章进行分类,并根据预测类型和机器学习算法进一步细分为四个子类别:经典算法、基于树的算法、基于向量的算法和神经网络算法。卷积神经网络和循环神经网络是深度学习模型中表现出良好准确性的两种,但它们需要大量标记数据才能进行有效训练。此外,该分析还揭示了数据集标准化和模型可解释性方面的重大差距,这对这些预测模型的更广泛适用性提出了挑战。在这篇综述中,我们强调了机器学习在HPI预测中的潜力,并强调了提高预测准确性必须解决的重要挑战。与之前的综述不同,我们的研究系统地比较了不同的计算方法,为未来的研究提供了路线图。研究结果强调了数据集质量、特征选择和模型透明度在推进人工智能驱动的病原体研究中的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Predicting host-pathogen interactions with machine learning algorithms: A scoping review

Background

Diseases caused by pathogenic microorganisms pose a persistent global health challenge. Pathogens exploit host mechanisms through intricate molecular interactions. Understanding these host-pathogen interactions (HPIs), particularly protein-protein interactions (PPIs), is crucial for developing therapeutic strategies. While experimental approaches are essential, they are often labor-intensive and costly. Researchers have been able to predict HPIs more efficiently due to recent advances in artificial intelligence and machine learning. However, existing reviews lack a systematic evaluation of different machine learning methodologies and their effectiveness.

Methods

This scoping review critically examines recent studies on machine learning-based Host-Pathogen Interaction (HPI) prediction, categorizing them by host and pathogen types, machine learning algorithms, and key evaluation metrics. The methodology is based on the study beginning with a preliminary search in reputable using key phrases related to host-pathogen interactions from 2019 to 2024. This process yielded 46 relevant articles, from which 30 were selected for review after evaluating titles and abstracts.

Results

Our findings indicate that tree-based algorithms, particularly Random Forest and Gradient Boosting, are the most prevalent in Host-Pathogen Interaction (HPI) prediction. The filter articles were categorized by host and pathogen type and further subdivided into four subcategories based on the prediction type and machine learning algorithms: classic, tree-based, vector-based, and neural network algorithms. Convolutional and recurrent neural networks are among the deep learning models that demonstrate promising accuracy, but they require a lot of labeled data for effective training. Additionally, the analysis uncovers significant gaps in dataset standardization and model interpretability, which pose challenges to the broader applicability of these predictive models.

Conclusion

In this review, we emphasize the potential of machine learning in HPI prediction and highlight the important challenges that must be addressed to improve predictive accuracy. Unlike previous reviews, our study systematically compares different computational approaches, offering a roadmap for future research. The findings emphasize the importance of dataset quality, feature selection, and model transparency in advancing AI-driven pathogen research.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Infection Genetics and Evolution
Infection Genetics and Evolution 医学-传染病学
CiteScore
8.40
自引率
0.00%
发文量
215
审稿时长
82 days
期刊介绍: (aka Journal of Molecular Epidemiology and Evolutionary Genetics of Infectious Diseases -- MEEGID) Infectious diseases constitute one of the main challenges to medical science in the coming century. The impressive development of molecular megatechnologies and of bioinformatics have greatly increased our knowledge of the evolution, transmission and pathogenicity of infectious diseases. Research has shown that host susceptibility to many infectious diseases has a genetic basis. Furthermore, much is now known on the molecular epidemiology, evolution and virulence of pathogenic agents, as well as their resistance to drugs, vaccines, and antibiotics. Equally, research on the genetics of disease vectors has greatly improved our understanding of their systematics, has increased our capacity to identify target populations for control or intervention, and has provided detailed information on the mechanisms of insecticide resistance. However, the genetics and evolutionary biology of hosts, pathogens and vectors have tended to develop as three separate fields of research. This artificial compartmentalisation is of concern due to our growing appreciation of the strong co-evolutionary interactions among hosts, pathogens and vectors. Infection, Genetics and Evolution and its companion congress [MEEGID](http://www.meegidconference.com/) (for Molecular Epidemiology and Evolutionary Genetics of Infectious Diseases) are the main forum acting for the cross-fertilization between evolutionary science and biomedical research on infectious diseases. Infection, Genetics and Evolution is the only journal that welcomes articles dealing with the genetics and evolutionary biology of hosts, pathogens and vectors, and coevolution processes among them in relation to infection and disease manifestation. All infectious models enter the scope of the journal, including pathogens of humans, animals and plants, either parasites, fungi, bacteria, viruses or prions. The journal welcomes articles dealing with genetics, population genetics, genomics, postgenomics, gene expression, evolutionary biology, population dynamics, mathematical modeling and bioinformatics. We also provide many author benefits, such as free PDFs, a liberal copyright policy, special discounts on Elsevier publications and much more. Please click here for more information on our author services .
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信