Ali Şahin, Gamze Sonmez, Mehmet Karaselek, İsmail Reisli
{"title":"Multi-class machine learning-based classification of SCID-related genetic variants.","authors":"Ali Şahin, Gamze Sonmez, Mehmet Karaselek, İsmail Reisli","doi":"10.1007/s12026-025-09685-8","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Variants of uncertain significance (VUS) represent a major diagnostic challenge in the interpretation of genetic testing results, particularly in the context of inborn errors of immunity such as severe combined immunodeficiency (SCID). The inconsistency among computational prediction tools often necessitates expensive and time-consuming wet-lab analyses.</p><p><strong>Objective: </strong>This study aimed to develop disease-specific, multi-class machine learning models using in silico scores to classify SCID-associated genetic variants and improve the interpretation of VUS.</p><p><strong>Methods: </strong>Genes associated with SCID were identified based on the 2024 update of the International Union of Immunological Societies. Missense variants were retrieved from ClinVar and labeled as benign, likely benign, likely pathogenic, or pathogenic. Variants classified as VUS or with conflicting interpretations were excluded. In silico functional prediction scores were collected for each variant. Multi-class classification models were developed using six machine learning algorithms: Random Forest, XGBoost, Gradient Boosting, AdaBoost, Support Vector Machine and Logistic Regression. Performance was evaluated using five-fold cross-validation with five repeats (25 folds).</p><p><strong>Results: </strong>A total of 537 variants from 71 genes were included in the final dataset. Among the models, Random Forest achieved the best performance with an accuracy of 0.70 ± 0.03 and the highest area under the receiver operating characteristic curve (AUROC: 0.90 ± 0.01). MetaRNN, BayesDel_addAF, and REVEL were the most predictive features.</p><p><strong>Conclusion: </strong>This study demonstrates that disease-specific, multi-class machine learning models leveraging in silico scores can effectively support the classification of SCID-related variants, offering a promising tool for improving VUS interpretation.</p><p><strong>Key messages: </strong>Multi-class machine learning models can enhance the interpretation of SCID-related VUS. Random Forest showed the highest diagnostic accuracy and robustness among tested models. Disease-specific modeling improves classification performance despite limited datasets. Capsule Summary This study developed disease-specific multi-class machine learning models to classify SCID-related variants using in silico scores, with Random Forest showing the strongest performance in predicting variant pathogenicity.</p>","PeriodicalId":13389,"journal":{"name":"Immunologic Research","volume":"73 1","pages":"129"},"PeriodicalIF":3.1000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Immunologic Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s12026-025-09685-8","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"IMMUNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Variants of uncertain significance (VUS) represent a major diagnostic challenge in the interpretation of genetic testing results, particularly in the context of inborn errors of immunity such as severe combined immunodeficiency (SCID). The inconsistency among computational prediction tools often necessitates expensive and time-consuming wet-lab analyses.
Objective: This study aimed to develop disease-specific, multi-class machine learning models using in silico scores to classify SCID-associated genetic variants and improve the interpretation of VUS.
Methods: Genes associated with SCID were identified based on the 2024 update of the International Union of Immunological Societies. Missense variants were retrieved from ClinVar and labeled as benign, likely benign, likely pathogenic, or pathogenic. Variants classified as VUS or with conflicting interpretations were excluded. In silico functional prediction scores were collected for each variant. Multi-class classification models were developed using six machine learning algorithms: Random Forest, XGBoost, Gradient Boosting, AdaBoost, Support Vector Machine and Logistic Regression. Performance was evaluated using five-fold cross-validation with five repeats (25 folds).
Results: A total of 537 variants from 71 genes were included in the final dataset. Among the models, Random Forest achieved the best performance with an accuracy of 0.70 ± 0.03 and the highest area under the receiver operating characteristic curve (AUROC: 0.90 ± 0.01). MetaRNN, BayesDel_addAF, and REVEL were the most predictive features.
Conclusion: This study demonstrates that disease-specific, multi-class machine learning models leveraging in silico scores can effectively support the classification of SCID-related variants, offering a promising tool for improving VUS interpretation.
Key messages: Multi-class machine learning models can enhance the interpretation of SCID-related VUS. Random Forest showed the highest diagnostic accuracy and robustness among tested models. Disease-specific modeling improves classification performance despite limited datasets. Capsule Summary This study developed disease-specific multi-class machine learning models to classify SCID-related variants using in silico scores, with Random Forest showing the strongest performance in predicting variant pathogenicity.
期刊介绍:
IMMUNOLOGIC RESEARCH represents a unique medium for the presentation, interpretation, and clarification of complex scientific data. Information is presented in the form of interpretive synthesis reviews, original research articles, symposia, editorials, and theoretical essays. The scope of coverage extends to cellular immunology, immunogenetics, molecular and structural immunology, immunoregulation and autoimmunity, immunopathology, tumor immunology, host defense and microbial immunity, including viral immunology, immunohematology, mucosal immunity, complement, transplantation immunology, clinical immunology, neuroimmunology, immunoendocrinology, immunotoxicology, translational immunology, and history of immunology.