{"title":"Machine learning and bioinformatics analysis to identify and validate diagnostic model associated with immune infiltration in rheumatoid arthritis.","authors":"Jiayang Jin, Xiaohong Xiang, Xuanlin Cai, Yuke Hou, Zhaoqi Zhang, Jing Li","doi":"10.1007/s10067-025-07514-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Rheumatoid arthritis (RA) is a chronic autoinflammatory condition that can result in significant disability. This study focuses on identifying immune infiltration-related diagnostic biomarkers of RA patients.</p><p><strong>Method: </strong>Publicly available datasets from the Gene Expression Omnibus (GEO) were analyzed using ssGSEA and CIBERSORT algorithms to measure immune cell subset infiltration. Functional enrichment analyses, including Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG), were performed. Additionally, least absolute shrinkage and selection operator (LASSO) regression and machine learning methods, such as random forest, were employed to identify key immune infiltration-related genes. Differential expression of these hub genes between subgroups was compared, and their diagnostic potential was evaluated through receiver operating characteristic (ROC) analysis, validated using GSE93777 and GSE205962 datasets.</p><p><strong>Results: </strong>Analysis of mRNA expression from GSE93272 revealed two distinct clusters: immunity_low (38 samples) and immunity_high (194 samples). A total of 320 differentially expressed genes (DEGs) were identified by intersecting DEGs from these clusters with those from RA and healthy controls (HC). Five hub genes (BMX, BTLA, CENPK, CMPK2, GBP3) were selected using LASSO and machine learning approaches, forming the basis of a diagnostic risk model. This five-gene model demonstrated strong diagnostic performance for distinguishing immune infiltration statuses (AUC = 0.977) and identifying RA patients (AUC = 0.942). External validation with GSE93777 (AUC = 0.807) and GSE205962 (AUC = 0.938) datasets confirmed its reliability.</p><p><strong>Conclusion: </strong>Five key genes associated with immune infiltration were identified, enabling the construction of a diagnostic model for RA. This model shows potential to improve RA diagnosis and facilitate the development of personalized therapeutic strategies. Key Points •RA patients were stratified into two distinct immune subtypes (Immunity_H and Immunity_L) based on ssGSEA analysis of 29 immune gene sets, revealing marked differences in immune activity and HLA gene expression. •Five hub genes including BMX, BTLA, CENPK, CMPK2, and GBP3, were identified through LASSO and Random Forest algorithms, forming a robust risk model that accurately distinguishes RA patients and their immune subtypes. •The predictive model was validated in two independent external cohorts, confirming its diagnostic reliability and generalizability across RA datasets.</p>","PeriodicalId":10482,"journal":{"name":"Clinical Rheumatology","volume":" ","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Rheumatology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s10067-025-07514-9","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RHEUMATOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Rheumatoid arthritis (RA) is a chronic autoinflammatory condition that can result in significant disability. This study focuses on identifying immune infiltration-related diagnostic biomarkers of RA patients.
Method: Publicly available datasets from the Gene Expression Omnibus (GEO) were analyzed using ssGSEA and CIBERSORT algorithms to measure immune cell subset infiltration. Functional enrichment analyses, including Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG), were performed. Additionally, least absolute shrinkage and selection operator (LASSO) regression and machine learning methods, such as random forest, were employed to identify key immune infiltration-related genes. Differential expression of these hub genes between subgroups was compared, and their diagnostic potential was evaluated through receiver operating characteristic (ROC) analysis, validated using GSE93777 and GSE205962 datasets.
Results: Analysis of mRNA expression from GSE93272 revealed two distinct clusters: immunity_low (38 samples) and immunity_high (194 samples). A total of 320 differentially expressed genes (DEGs) were identified by intersecting DEGs from these clusters with those from RA and healthy controls (HC). Five hub genes (BMX, BTLA, CENPK, CMPK2, GBP3) were selected using LASSO and machine learning approaches, forming the basis of a diagnostic risk model. This five-gene model demonstrated strong diagnostic performance for distinguishing immune infiltration statuses (AUC = 0.977) and identifying RA patients (AUC = 0.942). External validation with GSE93777 (AUC = 0.807) and GSE205962 (AUC = 0.938) datasets confirmed its reliability.
Conclusion: Five key genes associated with immune infiltration were identified, enabling the construction of a diagnostic model for RA. This model shows potential to improve RA diagnosis and facilitate the development of personalized therapeutic strategies. Key Points •RA patients were stratified into two distinct immune subtypes (Immunity_H and Immunity_L) based on ssGSEA analysis of 29 immune gene sets, revealing marked differences in immune activity and HLA gene expression. •Five hub genes including BMX, BTLA, CENPK, CMPK2, and GBP3, were identified through LASSO and Random Forest algorithms, forming a robust risk model that accurately distinguishes RA patients and their immune subtypes. •The predictive model was validated in two independent external cohorts, confirming its diagnostic reliability and generalizability across RA datasets.
期刊介绍:
Clinical Rheumatology is an international English-language journal devoted to publishing original clinical investigation and research in the general field of rheumatology with accent on clinical aspects at postgraduate level.
The journal succeeds Acta Rheumatologica Belgica, originally founded in 1945 as the official journal of the Belgian Rheumatology Society. Clinical Rheumatology aims to cover all modern trends in clinical and experimental research as well as the management and evaluation of diagnostic and treatment procedures connected with the inflammatory, immunologic, metabolic, genetic and degenerative soft and hard connective tissue diseases.