Shamika Ketkar, Hongzheng Dai, Lindsay Burrage, David Murdock, Brian Dawson, Marialbert Acosta-Herrera, Martin Kerick, Javier Martin, Kevin Wilhelm, Jennifer Kay Asmussen, Olivier Lichtarge, Regeneron Genetics Center, Shervin Assassi, Maureen D Mayes, Brendan H Lee
{"title":"Integrative exome sequencing and machine learning identify MICB and interferon pathway genes as contributors to SSc risk.","authors":"Shamika Ketkar, Hongzheng Dai, Lindsay Burrage, David Murdock, Brian Dawson, Marialbert Acosta-Herrera, Martin Kerick, Javier Martin, Kevin Wilhelm, Jennifer Kay Asmussen, Olivier Lichtarge, Regeneron Genetics Center, Shervin Assassi, Maureen D Mayes, Brendan H Lee","doi":"10.1016/j.ard.2025.05.009","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>Systemic sclerosis (SSc) is a complex autoimmune disease with both known and unidentified genetic contributors. While genome-wide association studies (GWAS) have implicated multiple loci, many reside in noncoding regions. We aimed to identify novel protein-coding variants and pathogenic pathways using exome sequencing (ES) integrated with an Evolutionary Action-Machine Learning (EAML) framework, single-cell RNA sequencing (scRNA-seq), and expression quantitative trait locus (eQTL) analysis.</p><p><strong>Methods: </strong>GWAS was conducted in 2,559 SSc cases and 893 controls of Caucasian ancestry, with replication in 9,846 cases and 18,333 controls of European ancestry. EAML prioritized genes with high-impact missense variants predictive of disease. Public scRNA-seq data from SSc and control skin biopsies were analyzed to localize gene expression across cell types. Whole blood eQTL data were used to identify regulatory effects of risk variants.</p><p><strong>Results: </strong>A novel SSc risk locus at MICB (rs2516497, P = 3.66 × 10<sup>-13</sup>) was identified and replicated. EAML highlighted 284 genes enriched in interferon signaling. scRNA-seq localized MICB and NOTCH4 to fibroblasts and endothelial cells, while HLA class II genes were enriched in macrophages and fibroblasts. eQTL analysis confirmed regulatory effects at MICB, NOTCH4, and other prioritized genes, linking SSc-associated variants to transcriptional dysregulation.</p><p><strong>Conclusions: </strong>This integrative genomic study identifies novel risk loci and mechanistic pathways in SSc, highlighting MICB, NOTCH4, and interferon-related genes. The findings provide insight into the cellular and regulatory architecture of SSc and support the utility of combining ES, machine learning, scRNA-seq, and eQTL data in complex disease genetics.</p>","PeriodicalId":8087,"journal":{"name":"Annals of the Rheumatic Diseases","volume":" ","pages":""},"PeriodicalIF":20.3000,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of the Rheumatic Diseases","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.ard.2025.05.009","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RHEUMATOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives: Systemic sclerosis (SSc) is a complex autoimmune disease with both known and unidentified genetic contributors. While genome-wide association studies (GWAS) have implicated multiple loci, many reside in noncoding regions. We aimed to identify novel protein-coding variants and pathogenic pathways using exome sequencing (ES) integrated with an Evolutionary Action-Machine Learning (EAML) framework, single-cell RNA sequencing (scRNA-seq), and expression quantitative trait locus (eQTL) analysis.
Methods: GWAS was conducted in 2,559 SSc cases and 893 controls of Caucasian ancestry, with replication in 9,846 cases and 18,333 controls of European ancestry. EAML prioritized genes with high-impact missense variants predictive of disease. Public scRNA-seq data from SSc and control skin biopsies were analyzed to localize gene expression across cell types. Whole blood eQTL data were used to identify regulatory effects of risk variants.
Results: A novel SSc risk locus at MICB (rs2516497, P = 3.66 × 10-13) was identified and replicated. EAML highlighted 284 genes enriched in interferon signaling. scRNA-seq localized MICB and NOTCH4 to fibroblasts and endothelial cells, while HLA class II genes were enriched in macrophages and fibroblasts. eQTL analysis confirmed regulatory effects at MICB, NOTCH4, and other prioritized genes, linking SSc-associated variants to transcriptional dysregulation.
Conclusions: This integrative genomic study identifies novel risk loci and mechanistic pathways in SSc, highlighting MICB, NOTCH4, and interferon-related genes. The findings provide insight into the cellular and regulatory architecture of SSc and support the utility of combining ES, machine learning, scRNA-seq, and eQTL data in complex disease genetics.
期刊介绍:
Annals of the Rheumatic Diseases (ARD) is an international peer-reviewed journal covering all aspects of rheumatology, which includes the full spectrum of musculoskeletal conditions, arthritic disease, and connective tissue disorders. ARD publishes basic, clinical, and translational scientific research, including the most important recommendations for the management of various conditions.