Waqar Ali , Jonathan Williams , Betty Xiong , James Zou , Roxana Daneshjou
{"title":"Machine Learning for Early Detection of Hidradenitis Suppurativa: A Feasibility Study Using Medical Insurance Claims Data","authors":"Waqar Ali , Jonathan Williams , Betty Xiong , James Zou , Roxana Daneshjou","doi":"10.1016/j.xjidi.2025.100362","DOIUrl":null,"url":null,"abstract":"<div><div>Patients with hidradenitis suppurativa (HS) are often misdiagnosed and may wait up to 10 years to receive a diagnosis of HS. This study aimed to predict HS diagnosis prior to actual diagnosis on the basis of previous medical history using models developed with insurance claims data. Three machine learning models were compared with a model using features selected by a dermatologist (clinical baseline model). The study analyzed 5,900,000 United States individuals’ insurance records over 13.5 years. The population included 13,886 patients with HS with at least 1 claim in each of the 2 years prior to their first HS diagnosis and 69,428 control patients with no HS diagnosis. The models aimed to classify HS diagnosis status on the basis of clinical features observed over 2 years. Model performance was assessed by area under the receiver operating characterisitic curve, F1-score, and precision and recall rates. The machine learning models (logistic regression, random forest, and XGBoost) showed a higher area under the receiver operating characterisitic curve than the clinical baseline model (logistic regression = 0.75, random forest = 0.79, XGBoost = 0.80, clinical = 0.71). In the clinical model and the best-performing XGBoost model, the top features associated with diagnosis were patient age at prediction and sex. The XGBoost model top features also included the use of sulfamethoxazole/trimethoprim and clindamycin phosphate and obesity.</div></div>","PeriodicalId":73548,"journal":{"name":"JID innovations : skin science from molecules to population health","volume":"5 3","pages":"Article 100362"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JID innovations : skin science from molecules to population health","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667026725000189","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Patients with hidradenitis suppurativa (HS) are often misdiagnosed and may wait up to 10 years to receive a diagnosis of HS. This study aimed to predict HS diagnosis prior to actual diagnosis on the basis of previous medical history using models developed with insurance claims data. Three machine learning models were compared with a model using features selected by a dermatologist (clinical baseline model). The study analyzed 5,900,000 United States individuals’ insurance records over 13.5 years. The population included 13,886 patients with HS with at least 1 claim in each of the 2 years prior to their first HS diagnosis and 69,428 control patients with no HS diagnosis. The models aimed to classify HS diagnosis status on the basis of clinical features observed over 2 years. Model performance was assessed by area under the receiver operating characterisitic curve, F1-score, and precision and recall rates. The machine learning models (logistic regression, random forest, and XGBoost) showed a higher area under the receiver operating characterisitic curve than the clinical baseline model (logistic regression = 0.75, random forest = 0.79, XGBoost = 0.80, clinical = 0.71). In the clinical model and the best-performing XGBoost model, the top features associated with diagnosis were patient age at prediction and sex. The XGBoost model top features also included the use of sulfamethoxazole/trimethoprim and clindamycin phosphate and obesity.