Fatemeh Salehi, Sara Bayat, Georg Schett, Arnd Kleyer, Thomas Altstidl, Bjoern M Eskofier
{"title":"ExSMART-PreRA: Explainable Survival and Risk Assessment Using Machine Learning for Time Estimation in Preclinical Rheumatoid Arthritis.","authors":"Fatemeh Salehi, Sara Bayat, Georg Schett, Arnd Kleyer, Thomas Altstidl, Bjoern M Eskofier","doi":"10.1109/JBHI.2025.3554364","DOIUrl":null,"url":null,"abstract":"<p><p>Rheumatoid arthritis (RA) is a chronic inflammatory autoimmune disease affecting peripheral joints. Before clinical diagnosis, individuals may possess certain antibodies and experience discomfort but without specific signs of RA or inflamed joints. This stage is termed \"preclinical RA,\" as these individuals are at risk of developing the disease. This early stage is difficult to define, necessitating the development of individual risk models. This study aims to estimate the time and risk of RA onset using various survival machine learning models. After identifying the best model, we stratify patients into risk categories and identify key risk factors. Data from 154 anonymized preclinical RA patients were collected and analyzed. Several survival analysis models were evaluated, including Survival Tree, Random Survival Forest, Extreme Gradient Boosting Survival, Linear Multi-Task Model, Neural Multi-Task Model, Support Vector Machines, and Cox Proportional Hazards. The Random Survival Forest model outperformed the others, achieving a mean C-index of 0.798. Using this model, patients were stratified into low-, medium-, and high-risk groups, facilitating personalized scheduling of clinical visits based on RA risk. To enhance model interpretability, SHapley Additive Explanations (SHAP) are employed to identify key risk factors. The baseline level of rheumatoid factor (RF) antibodies is the most significant predictor. Higher levels of anti-cyclic citrullinated peptide (anti-CCP) and RF antibodies at baseline are linked to earlier RA onset. This method provides valuable insights into key factors that might be overlooked in clinical practice and can improve patient management and quality of life for those at risk of developing RA.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Biomedical and Health Informatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/JBHI.2025.3554364","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Rheumatoid arthritis (RA) is a chronic inflammatory autoimmune disease affecting peripheral joints. Before clinical diagnosis, individuals may possess certain antibodies and experience discomfort but without specific signs of RA or inflamed joints. This stage is termed "preclinical RA," as these individuals are at risk of developing the disease. This early stage is difficult to define, necessitating the development of individual risk models. This study aims to estimate the time and risk of RA onset using various survival machine learning models. After identifying the best model, we stratify patients into risk categories and identify key risk factors. Data from 154 anonymized preclinical RA patients were collected and analyzed. Several survival analysis models were evaluated, including Survival Tree, Random Survival Forest, Extreme Gradient Boosting Survival, Linear Multi-Task Model, Neural Multi-Task Model, Support Vector Machines, and Cox Proportional Hazards. The Random Survival Forest model outperformed the others, achieving a mean C-index of 0.798. Using this model, patients were stratified into low-, medium-, and high-risk groups, facilitating personalized scheduling of clinical visits based on RA risk. To enhance model interpretability, SHapley Additive Explanations (SHAP) are employed to identify key risk factors. The baseline level of rheumatoid factor (RF) antibodies is the most significant predictor. Higher levels of anti-cyclic citrullinated peptide (anti-CCP) and RF antibodies at baseline are linked to earlier RA onset. This method provides valuable insights into key factors that might be overlooked in clinical practice and can improve patient management and quality of life for those at risk of developing RA.
期刊介绍:
IEEE Journal of Biomedical and Health Informatics publishes original papers presenting recent advances where information and communication technologies intersect with health, healthcare, life sciences, and biomedicine. Topics include acquisition, transmission, storage, retrieval, management, and analysis of biomedical and health information. The journal covers applications of information technologies in healthcare, patient monitoring, preventive care, early disease diagnosis, therapy discovery, and personalized treatment protocols. It explores electronic medical and health records, clinical information systems, decision support systems, medical and biological imaging informatics, wearable systems, body area/sensor networks, and more. Integration-related topics like interoperability, evidence-based medicine, and secure patient data are also addressed.