利用自然语言处理技术在真实世界的希伯来语自由文本电子病历中识别糖尿病相关并发症。

IF 3.7 Q2 ENDOCRINOLOGY & METABOLISM

Journal of Diabetes Science and Technology Pub Date : 2025-07-01 Epub Date: 2024-01-30 DOI:10.1177/19322968241228555

Mor Saban, Miri Lutski, Inbar Zucker, Moshe Uziel, Dror Ben-Moshe, Ariel Israel, Shlomo Vinker, Avivit Golan-Cohen, Izhar Laufer, Ilan Green, Roy Eldor, Eugene Merzon

{"title":"利用自然语言处理技术在真实世界的希伯来语自由文本电子病历中识别糖尿病相关并发症。","authors":"Mor Saban, Miri Lutski, Inbar Zucker, Moshe Uziel, Dror Ben-Moshe, Ariel Israel, Shlomo Vinker, Avivit Golan-Cohen, Izhar Laufer, Ilan Green, Roy Eldor, Eugene Merzon","doi":"10.1177/19322968241228555","DOIUrl":null,"url":null,"abstract":"Background: Studies have demonstrated that 50% to 80% of patients do not receive an International Classification of Diseases (ICD) code assigned to their medical encounter or condition. For these patients, their clinical information is mostly recorded as unstructured free-text narrative data in the medical record without standardized coding or extraction of structured data elements. Leumit Health Services (LHS) in collaboration with the Israeli Ministry of Health (MoH) conducted this study using electronic medical records (EMRs) to systematically extract meaningful clinical information about people with diabetes from the unstructured free-text notes.Objectives: To develop and validate natural language processing (NLP) algorithms to identify diabetes-related complications in the free-text medical records of patients who have LHS membership.Methods: The study data included 2.3 million records of 41 469 patients with diabetes aged 35 or older between the years 2012 and 2017. The diabetes related complications included cardiovascular disease, diabetic neuropathy, nephropathy, retinopathy, diabetic foot, cognitive impairments, mood disorders and hypoglycemia. A vocabulary list of terms was determined and adjudicated by two physicians who are experienced in diabetes care board certified diabetes specialist in endocrinology or family medicine. Two independent registered nurses with PhDs reviewed the free-text medical records. Both rule-based and machine learning techniques were used for the NLP algorithm development. Precision, recall, and F-score were calculated to compare the performance of (1) the NLP algorithm with the reviewers' comments and (2) the ICD codes with the reviewers' comments for each complication.Results: The NLP algorithm versus the reviewers (gold standard) achieved an overall good performance with a mean F-score of 86%. This was better than the ICD codes which achieved a mean F-score of only 51%.Conclusion: NLP algorithms and machine learning processes may enable more accurate identification of diabetes complications in EMR data.","PeriodicalId":15475,"journal":{"name":"Journal of Diabetes Science and Technology","volume":" ","pages":"999-1007"},"PeriodicalIF":3.7000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11571488/pdf/","citationCount":"0","resultStr":"{\"title\":\"Identifying Diabetes Related-Complications in a Real-World Free-Text Electronic Medical Records in Hebrew Using Natural Language Processing Techniques.\",\"authors\":\"Mor Saban, Miri Lutski, Inbar Zucker, Moshe Uziel, Dror Ben-Moshe, Ariel Israel, Shlomo Vinker, Avivit Golan-Cohen, Izhar Laufer, Ilan Green, Roy Eldor, Eugene Merzon\",\"doi\":\"10.1177/19322968241228555\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Studies have demonstrated that 50% to 80% of patients do not receive an International Classification of Diseases (ICD) code assigned to their medical encounter or condition. For these patients, their clinical information is mostly recorded as unstructured free-text narrative data in the medical record without standardized coding or extraction of structured data elements. Leumit Health Services (LHS) in collaboration with the Israeli Ministry of Health (MoH) conducted this study using electronic medical records (EMRs) to systematically extract meaningful clinical information about people with diabetes from the unstructured free-text notes.Objectives: To develop and validate natural language processing (NLP) algorithms to identify diabetes-related complications in the free-text medical records of patients who have LHS membership.Methods: The study data included 2.3 million records of 41 469 patients with diabetes aged 35 or older between the years 2012 and 2017. The diabetes related complications included cardiovascular disease, diabetic neuropathy, nephropathy, retinopathy, diabetic foot, cognitive impairments, mood disorders and hypoglycemia. A vocabulary list of terms was determined and adjudicated by two physicians who are experienced in diabetes care board certified diabetes specialist in endocrinology or family medicine. Two independent registered nurses with PhDs reviewed the free-text medical records. Both rule-based and machine learning techniques were used for the NLP algorithm development. Precision, recall, and F-score were calculated to compare the performance of (1) the NLP algorithm with the reviewers' comments and (2) the ICD codes with the reviewers' comments for each complication.Results: The NLP algorithm versus the reviewers (gold standard) achieved an overall good performance with a mean F-score of 86%. This was better than the ICD codes which achieved a mean F-score of only 51%.Conclusion: NLP algorithms and machine learning processes may enable more accurate identification of diabetes complications in EMR data.\",\"PeriodicalId\":15475,\"journal\":{\"name\":\"Journal of Diabetes Science and Technology\",\"volume\":\" \",\"pages\":\"999-1007\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11571488/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Diabetes Science and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/19322968241228555\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/30 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"ENDOCRINOLOGY & METABOLISM\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Diabetes Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/19322968241228555","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/30 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}

引用次数: 0

摘要

背景：研究表明，50% 到 80% 的患者在就诊或病情发生时没有获得国际疾病分类 (ICD) 代码。对于这些病人，他们的临床信息大多以非结构化自由文本叙述数据的形式记录在病历中，没有标准化编码或结构化数据元素提取。Leumit Health Services（LHS）与以色列卫生部（MoH）合作开展了这项研究，利用电子病历（EMR）从非结构化的自由文本记录中系统地提取糖尿病患者有意义的临床信息：开发并验证自然语言处理 (NLP) 算法，以识别 LHS 患者自由文本医疗记录中与糖尿病相关的并发症：研究数据包括 2012 年至 2017 年间 41 469 名 35 岁及以上糖尿病患者的 230 万份记录。糖尿病相关并发症包括心血管疾病、糖尿病神经病变、肾病、视网膜病变、糖尿病足、认知障碍、情绪障碍和低血糖。术语词汇表由两名在糖尿病护理方面经验丰富的内分泌学或家庭医学糖尿病专家委员会认证的医生确定和裁定。两名拥有博士学位的独立注册护士审查了自由文本病历。在开发 NLP 算法时，使用了基于规则的技术和机器学习技术。计算了精确度、召回率和 F 分数，以比较 (1) NLP 算法与审稿人意见和 (2) ICD 代码与审稿人意见在每种并发症上的表现：结果：NLP 算法与审稿人意见（金标准）的总体表现良好，平均 F 分数为 86%。结论：NLP 算法和机器学习算法可以帮助我们更好地识别并发症：结论：NLP 算法和机器学习过程可以更准确地识别 EMR 数据中的糖尿病并发症。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Identifying Diabetes Related-Complications in a Real-World Free-Text Electronic Medical Records in Hebrew Using Natural Language Processing Techniques.

Background: Studies have demonstrated that 50% to 80% of patients do not receive an International Classification of Diseases (ICD) code assigned to their medical encounter or condition. For these patients, their clinical information is mostly recorded as unstructured free-text narrative data in the medical record without standardized coding or extraction of structured data elements. Leumit Health Services (LHS) in collaboration with the Israeli Ministry of Health (MoH) conducted this study using electronic medical records (EMRs) to systematically extract meaningful clinical information about people with diabetes from the unstructured free-text notes.

Objectives: To develop and validate natural language processing (NLP) algorithms to identify diabetes-related complications in the free-text medical records of patients who have LHS membership.

Methods: The study data included 2.3 million records of 41 469 patients with diabetes aged 35 or older between the years 2012 and 2017. The diabetes related complications included cardiovascular disease, diabetic neuropathy, nephropathy, retinopathy, diabetic foot, cognitive impairments, mood disorders and hypoglycemia. A vocabulary list of terms was determined and adjudicated by two physicians who are experienced in diabetes care board certified diabetes specialist in endocrinology or family medicine. Two independent registered nurses with PhDs reviewed the free-text medical records. Both rule-based and machine learning techniques were used for the NLP algorithm development. Precision, recall, and F-score were calculated to compare the performance of (1) the NLP algorithm with the reviewers' comments and (2) the ICD codes with the reviewers' comments for each complication.

Results: The NLP algorithm versus the reviewers (gold standard) achieved an overall good performance with a mean F-score of 86%. This was better than the ICD codes which achieved a mean F-score of only 51%.

Conclusion: NLP algorithms and machine learning processes may enable more accurate identification of diabetes complications in EMR data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Diabetes Science and Technology Medicine-Internal Medicine

CiteScore

7.50

自引率

12.00%

发文量

148

期刊介绍： The Journal of Diabetes Science and Technology (JDST) is a bi-monthly, peer-reviewed scientific journal published by the Diabetes Technology Society. JDST covers scientific and clinical aspects of diabetes technology including glucose monitoring, insulin and metabolic peptide delivery, the artificial pancreas, digital health, precision medicine, social media, cybersecurity, software for modeling, physiologic monitoring, technology for managing obesity, and diagnostic tests of glycation. The journal also covers the development and use of mobile applications and wireless communication, as well as bioengineered tools such as MEMS, new biomaterials, and nanotechnology to develop new sensors. Articles in JDST cover both basic research and clinical applications of technologies being developed to help people with diabetes.