Developing and Evaluating a Natural Language Processing Algorithm to Detect Insulin Pump and Continuous Glucose Monitor Use in Electronic Health Records of Patients with Type 1 Diabetes.

IF 6.3 2区医学 Q1 ENDOCRINOLOGY & METABOLISM

Diabetes technology & therapeutics Pub Date : 2025-09-29 DOI:10.1177/15209156251383828

Estelle Everett, Ryan A Tiu, Bing Zhu, Jeffrey Feng, Nicholas Jackson, Joyce Graham, Eli Ipp, Nestoras Mathioudakis, Alex A T Bui, Tannaz Moin

{"title":"Developing and Evaluating a Natural Language Processing Algorithm to Detect Insulin Pump and Continuous Glucose Monitor Use in Electronic Health Records of Patients with Type 1 Diabetes.","authors":"Estelle Everett, Ryan A Tiu, Bing Zhu, Jeffrey Feng, Nicholas Jackson, Joyce Graham, Eli Ipp, Nestoras Mathioudakis, Alex A T Bui, Tannaz Moin","doi":"10.1177/15209156251383828","DOIUrl":null,"url":null,"abstract":"Objective: We aimed to develop and validate natural language processing (NLP) algorithms to identify insulin pump and continuous glucose monitor (CGM) users using unstructured clinical note data from the electronic health record (EHR). Methods: We reviewed a random sample of outpatient clinical notes from endocrinologists to catalog how insulin pump and CGM use was documented. We translated these patterns into regular expressions and used them to build rule-based NLP algorithms, which we iteratively refined. We evaluated the final algorithms in a University of California Los Angeles (UCLA) holdout dataset that included the most recent note from 667 unique patients. We then externally validated the algorithms in a second health system with a different EHR and patient population. Manual chart review served as the gold standard. We assessed performance with measures including sensitivity and specificity. To contextualize algorithm performance, we evaluated the accuracy of billing codes for insulin pump and CGM use within the same UCLA holdout dataset. Results: In the UCLA holdout dataset, our insulin pump algorithm achieved a sensitivity of 0.90 and specificity of 0.89. The CGM algorithm achieved a sensitivity of 0.85 and specificity of 0.84. The combined algorithm identifying both insulin pump and CGM use showed a sensitivity of 0.76 and specificity of 0.92. In comparison, billing codes underperformed: International Classification of Diseases/Current Procedural Terminology (CPT) codes identified insulin pump use with a sensitivity of 0.09 and specificity of 1.00, whereas CPT codes identified CGM use with a sensitivity of 0.68 and specificity of 0.86. For combined device use, billing codes had a sensitivity of 0.06 and specificity of 1.00. External validation demonstrated similarly strong algorithm performance in the second health system. Conclusions: We showed that NLP can accurately identify insulin pump and CGM users from unstructured EHR notes, substantially outperforming billing code-based methods. This scalable approach can support system- and population-level evaluations of diabetes technologies.","PeriodicalId":11159,"journal":{"name":"Diabetes technology & therapeutics","volume":" ","pages":""},"PeriodicalIF":6.3000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diabetes technology & therapeutics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/15209156251383828","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: We aimed to develop and validate natural language processing (NLP) algorithms to identify insulin pump and continuous glucose monitor (CGM) users using unstructured clinical note data from the electronic health record (EHR). Methods: We reviewed a random sample of outpatient clinical notes from endocrinologists to catalog how insulin pump and CGM use was documented. We translated these patterns into regular expressions and used them to build rule-based NLP algorithms, which we iteratively refined. We evaluated the final algorithms in a University of California Los Angeles (UCLA) holdout dataset that included the most recent note from 667 unique patients. We then externally validated the algorithms in a second health system with a different EHR and patient population. Manual chart review served as the gold standard. We assessed performance with measures including sensitivity and specificity. To contextualize algorithm performance, we evaluated the accuracy of billing codes for insulin pump and CGM use within the same UCLA holdout dataset. Results: In the UCLA holdout dataset, our insulin pump algorithm achieved a sensitivity of 0.90 and specificity of 0.89. The CGM algorithm achieved a sensitivity of 0.85 and specificity of 0.84. The combined algorithm identifying both insulin pump and CGM use showed a sensitivity of 0.76 and specificity of 0.92. In comparison, billing codes underperformed: International Classification of Diseases/Current Procedural Terminology (CPT) codes identified insulin pump use with a sensitivity of 0.09 and specificity of 1.00, whereas CPT codes identified CGM use with a sensitivity of 0.68 and specificity of 0.86. For combined device use, billing codes had a sensitivity of 0.06 and specificity of 1.00. External validation demonstrated similarly strong algorithm performance in the second health system. Conclusions: We showed that NLP can accurately identify insulin pump and CGM users from unstructured EHR notes, substantially outperforming billing code-based methods. This scalable approach can support system- and population-level evaluations of diabetes technologies.

查看原文本刊更多论文

1型糖尿病患者电子病历中胰岛素泵和连续血糖监测的自然语言处理算法的开发与评价

目的：我们旨在开发和验证自然语言处理（NLP）算法，利用来自电子健康记录（EHR）的非结构化临床记录数据识别胰岛素泵和连续血糖监测仪（CGM）用户。方法：我们回顾了随机抽样的门诊临床记录，从内分泌学家目录如何胰岛素泵和CGM使用的记录。我们将这些模式翻译成正则表达式，并用它们来构建基于规则的NLP算法，并对其进行迭代改进。我们在加州大学洛杉矶分校（UCLA）的保留数据集中评估了最终算法，该数据集包括来自667名独特患者的最新记录。然后，我们在具有不同电子病历和患者群体的第二个卫生系统中对算法进行了外部验证。手工图表审查是金标准。我们通过包括敏感性和特异性在内的措施来评估性能。为了将算法性能置于上下文环境中，我们评估了在同一UCLA holdout数据集中使用胰岛素泵和CGM的计费代码的准确性。结果：在UCLA holdout数据集中，我们的胰岛素泵算法的敏感性为0.90，特异性为0.89。CGM算法的敏感性为0.85，特异性为0.84。联合算法识别胰岛素泵和CGM使用的敏感性为0.76，特异性为0.92。相比之下，计费代码表现不佳：国际疾病分类/现行程序术语（CPT）代码识别胰岛素泵使用的敏感性为0.09，特异性为1.00，而CPT代码识别CGM使用的敏感性为0.68，特异性为0.86。对于组合设备使用，计费代码的灵敏度为0.06，特异性为1.00。外部验证表明，在第二个卫生系统中，算法的性能也同样强大。结论：我们发现NLP可以从非结构化的电子病历记录中准确识别胰岛素泵和CGM用户，大大优于基于账单代码的方法。这种可扩展的方法可以支持系统和人群水平的糖尿病技术评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Diabetes technology & therapeutics 医学-内分泌学与代谢

CiteScore

10.60

自引率

14.80%

发文量

145

审稿时长

3-8 weeks

期刊介绍： Diabetes Technology & Therapeutics is the only peer-reviewed journal providing healthcare professionals with information on new devices, drugs, drug delivery systems, and software for managing patients with diabetes. This leading international journal delivers practical information and comprehensive coverage of cutting-edge technologies and therapeutics in the field, and each issue highlights new pharmacological and device developments to optimize patient care.