Simranjit Grewal, Uwa Iyamu, Daniel Ferrer Vinals, Catherine J Mitran, Nidhi Hegde, Stephanie K Yanow
{"title":"Machine learning framework to extract physicochemical features of B-cell epitopes recognized by a cross-reactive antibody.","authors":"Simranjit Grewal, Uwa Iyamu, Daniel Ferrer Vinals, Catherine J Mitran, Nidhi Hegde, Stephanie K Yanow","doi":"10.1038/s41540-025-00583-1","DOIUrl":null,"url":null,"abstract":"<p><p>During infection with Plasmodium falciparum in pregnancy, parasites express a unique virulence factor, VAR2CSA, that mediates binding of infected red blood cells to the placenta. A major goal in designing vaccines to protect pregnant women from malaria is to elicit antibodies to VAR2CSA. The challenge is that VAR2CSA is highly polymorphic and identifying conserved epitopes is essential to elicit strain-transcending immunity. Unexpectedly, a mouse monoclonal antibody, 3D10, raised against region II of the unrelated Duffy binding protein from P. vivax (DBPII) cross-reacts with diverse alleles of VAR2CSA in vitro, suggesting that epitopes may be shared across this family of 'Duffy binding-like' (DBL) proteins. Peptide arrays spanning four DBL proteins from two Plasmodium spp, including two alleles of VAR2CSA, DBPII, and PvEBP2 (as a negative control), were screened with 3D10 but the data were too complex to manually identify common epitope sequences. As such, we designed a machine learning framework to analyse the array data. We applied decision trees to extract features correlated to 3D10 binding and evaluated the model on an independent dataset for a rodent Plasmodium DBL protein (PcDBP). Next, we analysed patterns of the features predicted by the model to be strongly associated with 3D10 binding and designed mutant peptides to test complex sequence motifs. Features associated with 3D10 reactivity were mapped onto predicted 3D structures of Plasmodium proteins and validated based on 3D10 reactivity to the recombinant antigens. While the array data identified certain linear epitopes, the framework predicted other epitopes to be conformational. This was demonstrated with PcDBP; as predicted by the model, no linear peptides reacted strongly with 3D10, yet the folded protein was recognized by the antibody in a conformation-dependent manner. With this approach, peptide array data can be mined to extract physicochemical properties of epitopes recognized by cross-reactive antibodies.</p>","PeriodicalId":19345,"journal":{"name":"NPJ Systems Biology and Applications","volume":"11 1","pages":"109"},"PeriodicalIF":3.5000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12491407/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NPJ Systems Biology and Applications","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1038/s41540-025-00583-1","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
During infection with Plasmodium falciparum in pregnancy, parasites express a unique virulence factor, VAR2CSA, that mediates binding of infected red blood cells to the placenta. A major goal in designing vaccines to protect pregnant women from malaria is to elicit antibodies to VAR2CSA. The challenge is that VAR2CSA is highly polymorphic and identifying conserved epitopes is essential to elicit strain-transcending immunity. Unexpectedly, a mouse monoclonal antibody, 3D10, raised against region II of the unrelated Duffy binding protein from P. vivax (DBPII) cross-reacts with diverse alleles of VAR2CSA in vitro, suggesting that epitopes may be shared across this family of 'Duffy binding-like' (DBL) proteins. Peptide arrays spanning four DBL proteins from two Plasmodium spp, including two alleles of VAR2CSA, DBPII, and PvEBP2 (as a negative control), were screened with 3D10 but the data were too complex to manually identify common epitope sequences. As such, we designed a machine learning framework to analyse the array data. We applied decision trees to extract features correlated to 3D10 binding and evaluated the model on an independent dataset for a rodent Plasmodium DBL protein (PcDBP). Next, we analysed patterns of the features predicted by the model to be strongly associated with 3D10 binding and designed mutant peptides to test complex sequence motifs. Features associated with 3D10 reactivity were mapped onto predicted 3D structures of Plasmodium proteins and validated based on 3D10 reactivity to the recombinant antigens. While the array data identified certain linear epitopes, the framework predicted other epitopes to be conformational. This was demonstrated with PcDBP; as predicted by the model, no linear peptides reacted strongly with 3D10, yet the folded protein was recognized by the antibody in a conformation-dependent manner. With this approach, peptide array data can be mined to extract physicochemical properties of epitopes recognized by cross-reactive antibodies.
期刊介绍:
npj Systems Biology and Applications is an online Open Access journal dedicated to publishing the premier research that takes a systems-oriented approach. The journal aims to provide a forum for the presentation of articles that help define this nascent field, as well as those that apply the advances to wider fields. We encourage studies that integrate, or aid the integration of, data, analyses and insight from molecules to organisms and broader systems. Important areas of interest include not only fundamental biological systems and drug discovery, but also applications to health, medical practice and implementation, big data, biotechnology, food science, human behaviour, broader biological systems and industrial applications of systems biology.
We encourage all approaches, including network biology, application of control theory to biological systems, computational modelling and analysis, comprehensive and/or high-content measurements, theoretical, analytical and computational studies of system-level properties of biological systems and computational/software/data platforms enabling such studies.