Yasmmin C. Martins, Maiana O. Cerqueira e Costa, Miranda C. Palumbo, Dario F. Do Porto, Fábio L. Custódio, Raphael Trevizani and Marisa Fabiana Nicolás*,
{"title":"PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria","authors":"Yasmmin C. Martins, Maiana O. Cerqueira e Costa, Miranda C. Palumbo, Dario F. Do Porto, Fábio L. Custódio, Raphael Trevizani and Marisa Fabiana Nicolás*, ","doi":"10.1021/acsomega.4c0714710.1021/acsomega.4c07147","DOIUrl":null,"url":null,"abstract":"<p >Antigenicity prediction plays a crucial role in vaccine development, antibody-based therapies, and diagnostic assays, as this predictive approach helps assess the potential of molecular structures to induce and recruit immune cells and drive antibody production. Several existing prediction methods, which target complete proteins and epitopes identified through reverse vaccinology, face limitations regarding input data constraints, feature extraction strategies, and insufficient flexibility for model evaluation and interpretation. This work presents PAPreC (Pipeline for Antigenicity Prediction Comparison), an open-source, versatile workflow (available at https://github.com/YasCoMa/paprec_nx_workflow) designed to address these challenges. PAPreC systematically examines three key factors: the selection of training data sets, feature extraction methods (including physicochemical descriptors and ESM-2 encoder-derived embeddings), and diverse classifiers. It provides automated model evaluation, interpretability through SHapley Additive exPlanations (SHAP) analysis, and applicability domain assessments, enabling researchers to identify optimal configurations for their specific data sets. Applying PAPreC to IEDB data as a reference, we demonstrate its effectiveness across the ESKAPE pathogen group. A case study involving <i><i>Pseudomonas aeruginosa</i></i> and <i>Staphylococcus aureus</i> shows that specific feature configurations are more suitable for different sequence types, and that ESM-2 embeddings enhance model performance. Moreover, our results indicate that separate models for Gram-positive and Gram-negative bacteria are not required. PAPreC offers a comprehensive, adaptable, and robust framework to streamline and improve antigenicity prediction for diverse bacterial data sets.</p>","PeriodicalId":22,"journal":{"name":"ACS Omega","volume":"10 6","pages":"5415–5429 5415–5429"},"PeriodicalIF":4.3000,"publicationDate":"2025-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/epdf/10.1021/acsomega.4c07147","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Omega","FirstCategoryId":"92","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acsomega.4c07147","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Antigenicity prediction plays a crucial role in vaccine development, antibody-based therapies, and diagnostic assays, as this predictive approach helps assess the potential of molecular structures to induce and recruit immune cells and drive antibody production. Several existing prediction methods, which target complete proteins and epitopes identified through reverse vaccinology, face limitations regarding input data constraints, feature extraction strategies, and insufficient flexibility for model evaluation and interpretation. This work presents PAPreC (Pipeline for Antigenicity Prediction Comparison), an open-source, versatile workflow (available at https://github.com/YasCoMa/paprec_nx_workflow) designed to address these challenges. PAPreC systematically examines three key factors: the selection of training data sets, feature extraction methods (including physicochemical descriptors and ESM-2 encoder-derived embeddings), and diverse classifiers. It provides automated model evaluation, interpretability through SHapley Additive exPlanations (SHAP) analysis, and applicability domain assessments, enabling researchers to identify optimal configurations for their specific data sets. Applying PAPreC to IEDB data as a reference, we demonstrate its effectiveness across the ESKAPE pathogen group. A case study involving Pseudomonas aeruginosa and Staphylococcus aureus shows that specific feature configurations are more suitable for different sequence types, and that ESM-2 embeddings enhance model performance. Moreover, our results indicate that separate models for Gram-positive and Gram-negative bacteria are not required. PAPreC offers a comprehensive, adaptable, and robust framework to streamline and improve antigenicity prediction for diverse bacterial data sets.
ACS OmegaChemical Engineering-General Chemical Engineering
CiteScore
6.60
自引率
4.90%
发文量
3945
审稿时长
2.4 months
期刊介绍:
ACS Omega is an open-access global publication for scientific articles that describe new findings in chemistry and interfacing areas of science, without any perceived evaluation of immediate impact.