Establishment and analysis of a novel diagnostic model for systemic juvenile idiopathic arthritis based on machine learning

IF 2.8 3区 医学 Q1 PEDIATRICS
Pan Ding, Yi Du, Xinyue Jiang, Huajian Chen, Li Huang
{"title":"Establishment and analysis of a novel diagnostic model for systemic juvenile idiopathic arthritis based on machine learning","authors":"Pan Ding, Yi Du, Xinyue Jiang, Huajian Chen, Li Huang","doi":"10.1186/s12969-023-00949-x","DOIUrl":null,"url":null,"abstract":"Systemic juvenile idiopathic arthritis (SJIA) is a form of childhood arthritis with clinical features such as fever, lymphadenopathy, arthritis, rash, and serositis. It seriously affects the growth and development of children and has a high rate of disability and mortality. SJIA may result from genetic, infectious, or autoimmune factors since the precise source of the disease is unknown. Our study aims to develop a genetic-based diagnostic model to explore the identification of SJIA at the genetic level. The gene expression dataset of peripheral blood mononuclear cell (PBMC) samples from SJIA was collected from the Gene Expression Omnibus (GEO) database. Then, three GEO datasets (GSE11907-GPL96, GSE8650-GPL96 and GSE13501) were merged and used as a training dataset, which included 125 SJIA samples and 92 health samples. GSE7753 was used as a validation dataset. The limma method was used to screen differentially expressed genes (DEGs). Feature selection was performed using Lasso, random forest (RF)-recursive feature elimination (RFE) and RF classifier. We finally identified 4 key genes (ALDH1A1, CEACAM1, YBX3 and SLC6A8) that were essential to distinguish SJIA from healthy samples. And we combined the 4 key genes and performed a grid search as well as 10-fold cross-validation with 5 repetitions to finally identify the RF model with optimal mtry. The mean area under the curve (AUC) value for 5-fold cross-validation was greater than 0.95. The model’s performance was then assessed once more using the validation dataset, and an AUC value of 0.990 was obtained. All of the above AUC values demonstrated the strong robustness of the SJIA diagnostic model. We successfully developed a new SJIA diagnostic model that can be used for a novel aid in the identification of SJIA. In addition, the identification of 4 key genes that may serve as potential biomarkers for SJIA provides new insights to further understand the mechanisms of SJIA.","PeriodicalId":54630,"journal":{"name":"Pediatric Rheumatology","volume":"31 1","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pediatric Rheumatology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12969-023-00949-x","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PEDIATRICS","Score":null,"Total":0}
引用次数: 0

Abstract

Systemic juvenile idiopathic arthritis (SJIA) is a form of childhood arthritis with clinical features such as fever, lymphadenopathy, arthritis, rash, and serositis. It seriously affects the growth and development of children and has a high rate of disability and mortality. SJIA may result from genetic, infectious, or autoimmune factors since the precise source of the disease is unknown. Our study aims to develop a genetic-based diagnostic model to explore the identification of SJIA at the genetic level. The gene expression dataset of peripheral blood mononuclear cell (PBMC) samples from SJIA was collected from the Gene Expression Omnibus (GEO) database. Then, three GEO datasets (GSE11907-GPL96, GSE8650-GPL96 and GSE13501) were merged and used as a training dataset, which included 125 SJIA samples and 92 health samples. GSE7753 was used as a validation dataset. The limma method was used to screen differentially expressed genes (DEGs). Feature selection was performed using Lasso, random forest (RF)-recursive feature elimination (RFE) and RF classifier. We finally identified 4 key genes (ALDH1A1, CEACAM1, YBX3 and SLC6A8) that were essential to distinguish SJIA from healthy samples. And we combined the 4 key genes and performed a grid search as well as 10-fold cross-validation with 5 repetitions to finally identify the RF model with optimal mtry. The mean area under the curve (AUC) value for 5-fold cross-validation was greater than 0.95. The model’s performance was then assessed once more using the validation dataset, and an AUC value of 0.990 was obtained. All of the above AUC values demonstrated the strong robustness of the SJIA diagnostic model. We successfully developed a new SJIA diagnostic model that can be used for a novel aid in the identification of SJIA. In addition, the identification of 4 key genes that may serve as potential biomarkers for SJIA provides new insights to further understand the mechanisms of SJIA.
基于机器学习的系统性幼年特发性关节炎新型诊断模型的建立与分析
全身性幼年特发性关节炎(SJIA)是一种儿童关节炎,临床特征为发热、淋巴结肿大、关节炎、皮疹和血清炎。它严重影响儿童的生长发育,致残率和死亡率都很高。SJIA 可能由遗传、感染或自身免疫因素引起,因为该病的确切病源尚不清楚。我们的研究旨在开发一种基于基因的诊断模型,从基因水平上探索如何识别 SJIA。我们从基因表达总库(GEO)数据库中收集了 SJIA 患者外周血单核细胞(PBMC)样本的基因表达数据集。然后,合并三个 GEO 数据集(GSE11907-GPL96、GSE8650-GPL96 和 GSE13501)作为训练数据集,其中包括 125 个 SJIA 样本和 92 个健康样本。GSE7753 被用作验证数据集。采用 limma 方法筛选差异表达基因(DEGs)。使用 Lasso、随机森林(RF)-递归特征消除(RFE)和 RF 分类器进行特征选择。我们最终确定了 4 个关键基因(ALDH1A1、CEACAM1、YBX3 和 SLC6A8),它们是区分 SJIA 与健康样本的关键基因。我们将这 4 个关键基因组合在一起,进行了网格搜索和 10 倍交叉验证(重复 5 次),最终确定了具有最佳 mtry 的 RF 模型。5 倍交叉验证的平均曲线下面积(AUC)值大于 0.95。然后使用验证数据集再次评估模型的性能,得到的 AUC 值为 0.990。所有上述 AUC 值都表明 SJIA 诊断模型具有很强的鲁棒性。我们成功地开发出了一种新的 SJIA 诊断模型,可用于 SJIA 的新型辅助鉴定。此外,鉴定出的 4 个关键基因可作为 SJIA 的潜在生物标记物,为进一步了解 SJIA 的机制提供了新的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Pediatric Rheumatology
Pediatric Rheumatology PEDIATRICS-RHEUMATOLOGY
CiteScore
4.10
自引率
8.00%
发文量
95
审稿时长
>12 weeks
期刊介绍: Pediatric Rheumatology is an open access, peer-reviewed, online journal encompassing all aspects of clinical and basic research related to pediatric rheumatology and allied subjects. The journal’s scope of diseases and syndromes include musculoskeletal pain syndromes, rheumatic fever and post-streptococcal syndromes, juvenile idiopathic arthritis, systemic lupus erythematosus, juvenile dermatomyositis, local and systemic scleroderma, Kawasaki disease, Henoch-Schonlein purpura and other vasculitides, sarcoidosis, inherited musculoskeletal syndromes, autoinflammatory syndromes, and others.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信