Multi-class machine learning-based classification of SCID-related genetic variants.

IF 3.1 4区医学 Q3 IMMUNOLOGY

Immunologic Research Pub Date : 2025-09-11 DOI:10.1007/s12026-025-09685-8

Ali Şahin, Gamze Sonmez, Mehmet Karaselek, İsmail Reisli

{"title":"Multi-class machine learning-based classification of SCID-related genetic variants.","authors":"Ali Şahin, Gamze Sonmez, Mehmet Karaselek, İsmail Reisli","doi":"10.1007/s12026-025-09685-8","DOIUrl":null,"url":null,"abstract":"Background: Variants of uncertain significance (VUS) represent a major diagnostic challenge in the interpretation of genetic testing results, particularly in the context of inborn errors of immunity such as severe combined immunodeficiency (SCID). The inconsistency among computational prediction tools often necessitates expensive and time-consuming wet-lab analyses.Objective: This study aimed to develop disease-specific, multi-class machine learning models using in silico scores to classify SCID-associated genetic variants and improve the interpretation of VUS.Methods: Genes associated with SCID were identified based on the 2024 update of the International Union of Immunological Societies. Missense variants were retrieved from ClinVar and labeled as benign, likely benign, likely pathogenic, or pathogenic. Variants classified as VUS or with conflicting interpretations were excluded. In silico functional prediction scores were collected for each variant. Multi-class classification models were developed using six machine learning algorithms: Random Forest, XGBoost, Gradient Boosting, AdaBoost, Support Vector Machine and Logistic Regression. Performance was evaluated using five-fold cross-validation with five repeats (25 folds).Results: A total of 537 variants from 71 genes were included in the final dataset. Among the models, Random Forest achieved the best performance with an accuracy of 0.70 ± 0.03 and the highest area under the receiver operating characteristic curve (AUROC: 0.90 ± 0.01). MetaRNN, BayesDel_addAF, and REVEL were the most predictive features.Conclusion: This study demonstrates that disease-specific, multi-class machine learning models leveraging in silico scores can effectively support the classification of SCID-related variants, offering a promising tool for improving VUS interpretation.Key messages: Multi-class machine learning models can enhance the interpretation of SCID-related VUS. Random Forest showed the highest diagnostic accuracy and robustness among tested models. Disease-specific modeling improves classification performance despite limited datasets. Capsule Summary This study developed disease-specific multi-class machine learning models to classify SCID-related variants using in silico scores, with Random Forest showing the strongest performance in predicting variant pathogenicity.","PeriodicalId":13389,"journal":{"name":"Immunologic Research","volume":"73 1","pages":"129"},"PeriodicalIF":3.1000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Immunologic Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s12026-025-09685-8","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"IMMUNOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Variants of uncertain significance (VUS) represent a major diagnostic challenge in the interpretation of genetic testing results, particularly in the context of inborn errors of immunity such as severe combined immunodeficiency (SCID). The inconsistency among computational prediction tools often necessitates expensive and time-consuming wet-lab analyses.

Objective: This study aimed to develop disease-specific, multi-class machine learning models using in silico scores to classify SCID-associated genetic variants and improve the interpretation of VUS.

Methods: Genes associated with SCID were identified based on the 2024 update of the International Union of Immunological Societies. Missense variants were retrieved from ClinVar and labeled as benign, likely benign, likely pathogenic, or pathogenic. Variants classified as VUS or with conflicting interpretations were excluded. In silico functional prediction scores were collected for each variant. Multi-class classification models were developed using six machine learning algorithms: Random Forest, XGBoost, Gradient Boosting, AdaBoost, Support Vector Machine and Logistic Regression. Performance was evaluated using five-fold cross-validation with five repeats (25 folds).

Results: A total of 537 variants from 71 genes were included in the final dataset. Among the models, Random Forest achieved the best performance with an accuracy of 0.70 ± 0.03 and the highest area under the receiver operating characteristic curve (AUROC: 0.90 ± 0.01). MetaRNN, BayesDel_addAF, and REVEL were the most predictive features.

Conclusion: This study demonstrates that disease-specific, multi-class machine learning models leveraging in silico scores can effectively support the classification of SCID-related variants, offering a promising tool for improving VUS interpretation.

Key messages: Multi-class machine learning models can enhance the interpretation of SCID-related VUS. Random Forest showed the highest diagnostic accuracy and robustness among tested models. Disease-specific modeling improves classification performance despite limited datasets. Capsule Summary This study developed disease-specific multi-class machine learning models to classify SCID-related variants using in silico scores, with Random Forest showing the strongest performance in predicting variant pathogenicity.

查看原文本刊更多论文

基于多类机器学习的scid相关基因变异分类。

背景：不确定意义变异（VUS）是解释基因检测结果的主要诊断挑战，特别是在先天性免疫错误（如严重联合免疫缺陷（SCID））的背景下。计算预测工具之间的不一致性通常需要昂贵且耗时的湿实验室分析。目的：本研究旨在开发针对疾病的多类别机器学习模型，使用计算机评分对scid相关遗传变异进行分类，并改进对VUS的解释。方法：基于国际免疫学会联合会2024年更新的SCID相关基因进行鉴定。从ClinVar中检索错义变异并标记为良性、可能良性、可能致病性或致病性。被归类为VUS或具有相互矛盾解释的变体被排除在外。在计算机上收集每个变异的功能预测分数。采用随机森林、XGBoost、梯度boost、AdaBoost、支持向量机和Logistic回归等6种机器学习算法建立了多类分类模型。使用5次重复（25次）的5倍交叉验证来评估性能。结果：最终数据集中包括来自71个基因的537个变体。其中，随机森林模型的准确率为0.70±0.03，在受试者工作特征曲线下的面积（AUROC）最高，为0.90±0.01。MetaRNN、BayesDel_addAF和REVEL是最具预测性的特征。结论：本研究表明，利用计算机评分的疾病特异性多类机器学习模型可以有效地支持scid相关变异的分类，为改进VUS解释提供了一个有前途的工具。多类机器学习模型可以增强scid相关VUS的解释。随机森林显示了最高的诊断准确性和鲁棒性的测试模型。尽管数据集有限，但疾病特异性建模提高了分类性能。本研究开发了针对疾病的多类机器学习模型，使用计算机评分对scid相关变异进行分类，其中Random Forest在预测变异致病性方面表现最强。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Immunologic Research 医学-免疫学

CiteScore

6.90

自引率

0.00%

发文量

审稿时长

6-12 weeks

期刊介绍： IMMUNOLOGIC RESEARCH represents a unique medium for the presentation, interpretation, and clarification of complex scientific data. Information is presented in the form of interpretive synthesis reviews, original research articles, symposia, editorials, and theoretical essays. The scope of coverage extends to cellular immunology, immunogenetics, molecular and structural immunology, immunoregulation and autoimmunity, immunopathology, tumor immunology, host defense and microbial immunity, including viral immunology, immunohematology, mucosal immunity, complement, transplantation immunology, clinical immunology, neuroimmunology, immunoendocrinology, immunotoxicology, translational immunology, and history of immunology.