{"title":"Validated Models Using EHRs or Claims Data to Distinguish Diabetes Type among Adults","authors":"JR Campione","doi":"10.13188/2475-5591.1000018","DOIUrl":null,"url":null,"abstract":"Purpose: Clinical data provides the opportunity for efficient and timely disease surveillance. We developed and validated advanced phenotyping models to classify adult patients with diabetes to type 1, type 2, or other/indeterminate using structured fields from EHR data. To simulate the use of claims data supplemented with medication information, we compared model performance before and after the removal of body mass index (BMI) and laboratory results. Methods: We used 3 years of EHR data from a sample of 2,465 adult patients with diabetes from a health care system’s clinical data warehouse. A weighted ratio of type 1 diabetes codes to all diabetes codes was created by down-weighting codes from care settings that do not treat diabetes. We developed two multinomial regression models and a machine learning conditional inference tree to classify patients to type 1, type 2, or other/indeterminate. The models were validated by calculating sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) relative to a gold standard. Results: For all models, the weighted ratio of type 1 diabetes was the strongest predictive factor. The models had validation statistics ≥ 93% for sensitivity; ≥ 87% for specificity; ≥ 88% for PPV, and ≥ 93% for NPV. After removal of BMI and laboratory data from the regression model the largest decline in performance from the full model was in type 2 diabetes specificity (90.8% to 89.2%). Conclusion: Prediction models and machine learning conditional inference trees using either structured fields from EHR data or claims data supplemented with medication data can be used to accurately distinguish diabetes type among adults. The inclusion of BMI and laboratory results improves model specificity for type 2","PeriodicalId":142531,"journal":{"name":"Advances in Diabetes & Endocrinology","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Diabetes & Endocrinology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.13188/2475-5591.1000018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: Clinical data provides the opportunity for efficient and timely disease surveillance. We developed and validated advanced phenotyping models to classify adult patients with diabetes to type 1, type 2, or other/indeterminate using structured fields from EHR data. To simulate the use of claims data supplemented with medication information, we compared model performance before and after the removal of body mass index (BMI) and laboratory results. Methods: We used 3 years of EHR data from a sample of 2,465 adult patients with diabetes from a health care system’s clinical data warehouse. A weighted ratio of type 1 diabetes codes to all diabetes codes was created by down-weighting codes from care settings that do not treat diabetes. We developed two multinomial regression models and a machine learning conditional inference tree to classify patients to type 1, type 2, or other/indeterminate. The models were validated by calculating sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) relative to a gold standard. Results: For all models, the weighted ratio of type 1 diabetes was the strongest predictive factor. The models had validation statistics ≥ 93% for sensitivity; ≥ 87% for specificity; ≥ 88% for PPV, and ≥ 93% for NPV. After removal of BMI and laboratory data from the regression model the largest decline in performance from the full model was in type 2 diabetes specificity (90.8% to 89.2%). Conclusion: Prediction models and machine learning conditional inference trees using either structured fields from EHR data or claims data supplemented with medication data can be used to accurately distinguish diabetes type among adults. The inclusion of BMI and laboratory results improves model specificity for type 2