Ekaterina S Porfireva, Anton D Zadorozhny, Anastasia V Rudik, Dmitry A Filimonov, Alexey A Lagunin
{"title":"Sequence-structure based prediction of pathogenicity for amino acid substitutions in proteins associated with primary immunodeficiencies.","authors":"Ekaterina S Porfireva, Anton D Zadorozhny, Anastasia V Rudik, Dmitry A Filimonov, Alexey A Lagunin","doi":"10.3389/fimmu.2025.1492751","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Primary immunodeficiencies (PIDs) are a group of rare genetic disorders characterized by dysfunction of the immune system components. Early diagnosis and treatment are essential to prevent severe or life-threatening complications. PIDs are manifested by diverse clinical symptoms, posing challenges for accurate diagnosis. A key aspect of PID diagnosis is identifying specific amino acid substitutions in the proteins related with heritable diseases. In this study, we have developed classification sequence-structure-property relationships (SSPR) models for predicting the pathogenicity of amino acid substitutions (AAS) in 25 proteins associated with the most important and genetically studied PIDs and encoded genes: <i>IL2RG, JAK3, RAG1, RAG2, ADA, DCLRE1C, CD40LG, WAS, ATM, STAT3, KMT2D, BTK, FOXP3, AIRE, FAS, ELANE, ITGB2, CYBB, G6PD, GATA2, STAT1, IFIH1, NLRP3, MEFV, and SERPING1</i>.</p><p><strong>Methods: </strong>The data on 4825 pathogenic and benign AASs in the selected proteins were extracted from ClinVar and gnomAD. SSPR models were created for each protein using the MultiPASS software based on the Bayesian algorithm and different levels of MNA (Multilevel Neighborhoods of Atoms) descriptors for the representation of structural formulas of protein fragments including AAS.</p><p><strong>Results: </strong>The accuracy of prediction was assessed through a 5-fold cross-validation and compared to other bioinformatics tools, such as SIFT4G, Polyphen2 HDIV, FATHMM, MetaSVM, PROVEAN, ClinPred, and Alpha Missense. The best SSPR models demonstrated high accuracy, with an average ROC AUC of 0.831 ± 0.037, a Balanced accuracy of (0.763 ± 0.034), MCC (0.457 ± 0.06), and F-measure (0.623 ± 0.07) across all genes, outperforming the most popular bioinformatics tools.</p><p><strong>Conclusions: </strong>The best created SSPR models for the prediction of pathogenicity of amino acid substitutions related with PIDs have been implemented in a freely available web application SAV-Pred (Single Amino acid Variants Predictor, http://www.way2drug.com/SAV-Pred/), which may be a useful tool for medical geneticists and clinicians. The use of SAV-Pred for some clinical cases of PIDs are provided.</p>","PeriodicalId":12622,"journal":{"name":"Frontiers in Immunology","volume":"16 ","pages":"1492751"},"PeriodicalIF":5.7000,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11835853/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Immunology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fimmu.2025.1492751","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"IMMUNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Primary immunodeficiencies (PIDs) are a group of rare genetic disorders characterized by dysfunction of the immune system components. Early diagnosis and treatment are essential to prevent severe or life-threatening complications. PIDs are manifested by diverse clinical symptoms, posing challenges for accurate diagnosis. A key aspect of PID diagnosis is identifying specific amino acid substitutions in the proteins related with heritable diseases. In this study, we have developed classification sequence-structure-property relationships (SSPR) models for predicting the pathogenicity of amino acid substitutions (AAS) in 25 proteins associated with the most important and genetically studied PIDs and encoded genes: IL2RG, JAK3, RAG1, RAG2, ADA, DCLRE1C, CD40LG, WAS, ATM, STAT3, KMT2D, BTK, FOXP3, AIRE, FAS, ELANE, ITGB2, CYBB, G6PD, GATA2, STAT1, IFIH1, NLRP3, MEFV, and SERPING1.
Methods: The data on 4825 pathogenic and benign AASs in the selected proteins were extracted from ClinVar and gnomAD. SSPR models were created for each protein using the MultiPASS software based on the Bayesian algorithm and different levels of MNA (Multilevel Neighborhoods of Atoms) descriptors for the representation of structural formulas of protein fragments including AAS.
Results: The accuracy of prediction was assessed through a 5-fold cross-validation and compared to other bioinformatics tools, such as SIFT4G, Polyphen2 HDIV, FATHMM, MetaSVM, PROVEAN, ClinPred, and Alpha Missense. The best SSPR models demonstrated high accuracy, with an average ROC AUC of 0.831 ± 0.037, a Balanced accuracy of (0.763 ± 0.034), MCC (0.457 ± 0.06), and F-measure (0.623 ± 0.07) across all genes, outperforming the most popular bioinformatics tools.
Conclusions: The best created SSPR models for the prediction of pathogenicity of amino acid substitutions related with PIDs have been implemented in a freely available web application SAV-Pred (Single Amino acid Variants Predictor, http://www.way2drug.com/SAV-Pred/), which may be a useful tool for medical geneticists and clinicians. The use of SAV-Pred for some clinical cases of PIDs are provided.
期刊介绍:
Frontiers in Immunology is a leading journal in its field, publishing rigorously peer-reviewed research across basic, translational and clinical immunology. This multidisciplinary open-access journal is at the forefront of disseminating and communicating scientific knowledge and impactful discoveries to researchers, academics, clinicians and the public worldwide.
Frontiers in Immunology is the official Journal of the International Union of Immunological Societies (IUIS). Encompassing the entire field of Immunology, this journal welcomes papers that investigate basic mechanisms of immune system development and function, with a particular emphasis given to the description of the clinical and immunological phenotype of human immune disorders, and on the definition of their molecular basis.