Alexander Rudge, Neil McHugh, William Tillett, Theresa Smith
{"title":"An interpretable machine learning approach for detecting psoriatic arthritis in a UK primary care psoriasis cohort using electronic health records from the Clinical Practice Research Datalink.","authors":"Alexander Rudge, Neil McHugh, William Tillett, Theresa Smith","doi":"10.1016/j.ard.2025.01.051","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>Develop an interpretable machine learning model to detect patients with newly diagnosed psoriatic arthritis (PsA) in a cohort of psoriasis patients and identify important clinical indicators of PsA in primary care.</p><p><strong>Methods: </strong>We developed models using UK primary care electronic health records from the Clinical Practice Research Datalink (CPRD). The study population consisted of a cohort of (PsA free) patients with incident psoriasis who were followed prospectively. We used Bayesian networks (BNs) to identify patients who developed PsA using primary care variables measured prior to diagnosis and compared the results to a random forest (RF). Variables included patient demographics, musculoskeletal symptoms, blood tests, and prescriptions. The importance of each variable used in the models was evaluated using permutation variable importance. Model discrimination was measured using the area under the receiver operating characteristic curve (AUC) and the area under the precision-recall curve (PRAUC).</p><p><strong>Results: </strong>We identified a cohort of 122,330 patients with an incident psoriasis diagnosis between 1998 and 2019 in the CPRD, of whom 2460 patients went on to develop PsA. Our best BN achieved an AUC of 0.823, and PRAUC of 0.221, compared to the AUC of 0.851 and PRAUC of 0.261 of the RF. Psoriasis duration, nonsteroidal anti-inflammatory drug prescriptions, nonspecific arthritis, nonspecific arthralgia, and C-reactive protein blood tests were all important variables in our models.</p><p><strong>Conclusions: </strong>We were able to identify psoriasis patients at higher risk, and important indicators, of PsA in UK primary care. Further work is required to evaluate our model's usefulness in assisting PsA screening.</p>","PeriodicalId":8087,"journal":{"name":"Annals of the Rheumatic Diseases","volume":" ","pages":""},"PeriodicalIF":20.3000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of the Rheumatic Diseases","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.ard.2025.01.051","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RHEUMATOLOGY","Score":null,"Total":0}
An interpretable machine learning approach for detecting psoriatic arthritis in a UK primary care psoriasis cohort using electronic health records from the Clinical Practice Research Datalink.
Objectives: Develop an interpretable machine learning model to detect patients with newly diagnosed psoriatic arthritis (PsA) in a cohort of psoriasis patients and identify important clinical indicators of PsA in primary care.
Methods: We developed models using UK primary care electronic health records from the Clinical Practice Research Datalink (CPRD). The study population consisted of a cohort of (PsA free) patients with incident psoriasis who were followed prospectively. We used Bayesian networks (BNs) to identify patients who developed PsA using primary care variables measured prior to diagnosis and compared the results to a random forest (RF). Variables included patient demographics, musculoskeletal symptoms, blood tests, and prescriptions. The importance of each variable used in the models was evaluated using permutation variable importance. Model discrimination was measured using the area under the receiver operating characteristic curve (AUC) and the area under the precision-recall curve (PRAUC).
Results: We identified a cohort of 122,330 patients with an incident psoriasis diagnosis between 1998 and 2019 in the CPRD, of whom 2460 patients went on to develop PsA. Our best BN achieved an AUC of 0.823, and PRAUC of 0.221, compared to the AUC of 0.851 and PRAUC of 0.261 of the RF. Psoriasis duration, nonsteroidal anti-inflammatory drug prescriptions, nonspecific arthritis, nonspecific arthralgia, and C-reactive protein blood tests were all important variables in our models.
Conclusions: We were able to identify psoriasis patients at higher risk, and important indicators, of PsA in UK primary care. Further work is required to evaluate our model's usefulness in assisting PsA screening.
期刊介绍:
Annals of the Rheumatic Diseases (ARD) is an international peer-reviewed journal covering all aspects of rheumatology, which includes the full spectrum of musculoskeletal conditions, arthritic disease, and connective tissue disorders. ARD publishes basic, clinical, and translational scientific research, including the most important recommendations for the management of various conditions.