Developing and Evaluating a Natural Language Processing Algorithm to Detect Insulin Pump and Continuous Glucose Monitor Use in Electronic Health Records of Patients with Type 1 Diabetes.
Estelle Everett, Ryan A Tiu, Bing Zhu, Jeffrey Feng, Nicholas Jackson, Joyce Graham, Eli Ipp, Nestoras Mathioudakis, Alex A T Bui, Tannaz Moin
{"title":"Developing and Evaluating a Natural Language Processing Algorithm to Detect Insulin Pump and Continuous Glucose Monitor Use in Electronic Health Records of Patients with Type 1 Diabetes.","authors":"Estelle Everett, Ryan A Tiu, Bing Zhu, Jeffrey Feng, Nicholas Jackson, Joyce Graham, Eli Ipp, Nestoras Mathioudakis, Alex A T Bui, Tannaz Moin","doi":"10.1177/15209156251383828","DOIUrl":null,"url":null,"abstract":"<p><p><b><i>Objective:</i></b> We aimed to develop and validate natural language processing (NLP) algorithms to identify insulin pump and continuous glucose monitor (CGM) users using unstructured clinical note data from the electronic health record (EHR). <b><i>Methods:</i></b> We reviewed a random sample of outpatient clinical notes from endocrinologists to catalog how insulin pump and CGM use was documented. We translated these patterns into regular expressions and used them to build rule-based NLP algorithms, which we iteratively refined. We evaluated the final algorithms in a University of California Los Angeles (UCLA) holdout dataset that included the most recent note from 667 unique patients. We then externally validated the algorithms in a second health system with a different EHR and patient population. Manual chart review served as the gold standard. We assessed performance with measures including sensitivity and specificity. To contextualize algorithm performance, we evaluated the accuracy of billing codes for insulin pump and CGM use within the same UCLA holdout dataset. <b><i>Results:</i></b> In the UCLA holdout dataset, our insulin pump algorithm achieved a sensitivity of 0.90 and specificity of 0.89. The CGM algorithm achieved a sensitivity of 0.85 and specificity of 0.84. The combined algorithm identifying both insulin pump and CGM use showed a sensitivity of 0.76 and specificity of 0.92. In comparison, billing codes underperformed: International Classification of Diseases/Current Procedural Terminology (CPT) codes identified insulin pump use with a sensitivity of 0.09 and specificity of 1.00, whereas CPT codes identified CGM use with a sensitivity of 0.68 and specificity of 0.86. For combined device use, billing codes had a sensitivity of 0.06 and specificity of 1.00. External validation demonstrated similarly strong algorithm performance in the second health system. <b><i>Conclusions:</i></b> We showed that NLP can accurately identify insulin pump and CGM users from unstructured EHR notes, substantially outperforming billing code-based methods. This scalable approach can support system- and population-level evaluations of diabetes technologies.</p>","PeriodicalId":11159,"journal":{"name":"Diabetes technology & therapeutics","volume":" ","pages":""},"PeriodicalIF":6.3000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diabetes technology & therapeutics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/15209156251383828","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: We aimed to develop and validate natural language processing (NLP) algorithms to identify insulin pump and continuous glucose monitor (CGM) users using unstructured clinical note data from the electronic health record (EHR). Methods: We reviewed a random sample of outpatient clinical notes from endocrinologists to catalog how insulin pump and CGM use was documented. We translated these patterns into regular expressions and used them to build rule-based NLP algorithms, which we iteratively refined. We evaluated the final algorithms in a University of California Los Angeles (UCLA) holdout dataset that included the most recent note from 667 unique patients. We then externally validated the algorithms in a second health system with a different EHR and patient population. Manual chart review served as the gold standard. We assessed performance with measures including sensitivity and specificity. To contextualize algorithm performance, we evaluated the accuracy of billing codes for insulin pump and CGM use within the same UCLA holdout dataset. Results: In the UCLA holdout dataset, our insulin pump algorithm achieved a sensitivity of 0.90 and specificity of 0.89. The CGM algorithm achieved a sensitivity of 0.85 and specificity of 0.84. The combined algorithm identifying both insulin pump and CGM use showed a sensitivity of 0.76 and specificity of 0.92. In comparison, billing codes underperformed: International Classification of Diseases/Current Procedural Terminology (CPT) codes identified insulin pump use with a sensitivity of 0.09 and specificity of 1.00, whereas CPT codes identified CGM use with a sensitivity of 0.68 and specificity of 0.86. For combined device use, billing codes had a sensitivity of 0.06 and specificity of 1.00. External validation demonstrated similarly strong algorithm performance in the second health system. Conclusions: We showed that NLP can accurately identify insulin pump and CGM users from unstructured EHR notes, substantially outperforming billing code-based methods. This scalable approach can support system- and population-level evaluations of diabetes technologies.
期刊介绍:
Diabetes Technology & Therapeutics is the only peer-reviewed journal providing healthcare professionals with information on new devices, drugs, drug delivery systems, and software for managing patients with diabetes. This leading international journal delivers practical information and comprehensive coverage of cutting-edge technologies and therapeutics in the field, and each issue highlights new pharmacological and device developments to optimize patient care.