Identifying Diabetes Related-Complications in a Real-World Free-Text Electronic Medical Records in Hebrew Using Natural Language Processing Techniques.
Mor Saban, Miri Lutski, Inbar Zucker, Moshe Uziel, Dror Ben-Moshe, Ariel Israel, Shlomo Vinker, Avivit Golan-Cohen, Izhar Laufer, Ilan Green, Roy Eldor, Eugene Merzon
{"title":"Identifying Diabetes Related-Complications in a Real-World Free-Text Electronic Medical Records in Hebrew Using Natural Language Processing Techniques.","authors":"Mor Saban, Miri Lutski, Inbar Zucker, Moshe Uziel, Dror Ben-Moshe, Ariel Israel, Shlomo Vinker, Avivit Golan-Cohen, Izhar Laufer, Ilan Green, Roy Eldor, Eugene Merzon","doi":"10.1177/19322968241228555","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Studies have demonstrated that 50% to 80% of patients do not receive an International Classification of Diseases (ICD) code assigned to their medical encounter or condition. For these patients, their clinical information is mostly recorded as unstructured free-text narrative data in the medical record without standardized coding or extraction of structured data elements. Leumit Health Services (LHS) in collaboration with the Israeli Ministry of Health (MoH) conducted this study using electronic medical records (EMRs) to systematically extract meaningful clinical information about people with diabetes from the unstructured free-text notes.</p><p><strong>Objectives: </strong>To develop and validate natural language processing (NLP) algorithms to identify diabetes-related complications in the free-text medical records of patients who have LHS membership.</p><p><strong>Methods: </strong>The study data included 2.3 million records of 41 469 patients with diabetes aged 35 or older between the years 2012 and 2017. The diabetes related complications included cardiovascular disease, diabetic neuropathy, nephropathy, retinopathy, diabetic foot, cognitive impairments, mood disorders and hypoglycemia. A vocabulary list of terms was determined and adjudicated by two physicians who are experienced in diabetes care board certified diabetes specialist in endocrinology or family medicine. Two independent registered nurses with PhDs reviewed the free-text medical records. Both rule-based and machine learning techniques were used for the NLP algorithm development. Precision, recall, and <i>F</i>-score were calculated to compare the performance of (1) the NLP algorithm with the reviewers' comments and (2) the ICD codes with the reviewers' comments for each complication.</p><p><strong>Results: </strong>The NLP algorithm versus the reviewers (gold standard) achieved an overall good performance with a mean <i>F</i>-score of 86%. This was better than the ICD codes which achieved a mean <i>F</i>-score of only 51%.</p><p><strong>Conclusion: </strong>NLP algorithms and machine learning processes may enable more accurate identification of diabetes complications in EMR data.</p>","PeriodicalId":15475,"journal":{"name":"Journal of Diabetes Science and Technology","volume":" ","pages":"999-1007"},"PeriodicalIF":3.7000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11571488/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Diabetes Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/19322968241228555","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/30 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Studies have demonstrated that 50% to 80% of patients do not receive an International Classification of Diseases (ICD) code assigned to their medical encounter or condition. For these patients, their clinical information is mostly recorded as unstructured free-text narrative data in the medical record without standardized coding or extraction of structured data elements. Leumit Health Services (LHS) in collaboration with the Israeli Ministry of Health (MoH) conducted this study using electronic medical records (EMRs) to systematically extract meaningful clinical information about people with diabetes from the unstructured free-text notes.
Objectives: To develop and validate natural language processing (NLP) algorithms to identify diabetes-related complications in the free-text medical records of patients who have LHS membership.
Methods: The study data included 2.3 million records of 41 469 patients with diabetes aged 35 or older between the years 2012 and 2017. The diabetes related complications included cardiovascular disease, diabetic neuropathy, nephropathy, retinopathy, diabetic foot, cognitive impairments, mood disorders and hypoglycemia. A vocabulary list of terms was determined and adjudicated by two physicians who are experienced in diabetes care board certified diabetes specialist in endocrinology or family medicine. Two independent registered nurses with PhDs reviewed the free-text medical records. Both rule-based and machine learning techniques were used for the NLP algorithm development. Precision, recall, and F-score were calculated to compare the performance of (1) the NLP algorithm with the reviewers' comments and (2) the ICD codes with the reviewers' comments for each complication.
Results: The NLP algorithm versus the reviewers (gold standard) achieved an overall good performance with a mean F-score of 86%. This was better than the ICD codes which achieved a mean F-score of only 51%.
Conclusion: NLP algorithms and machine learning processes may enable more accurate identification of diabetes complications in EMR data.
期刊介绍:
The Journal of Diabetes Science and Technology (JDST) is a bi-monthly, peer-reviewed scientific journal published by the Diabetes Technology Society. JDST covers scientific and clinical aspects of diabetes technology including glucose monitoring, insulin and metabolic peptide delivery, the artificial pancreas, digital health, precision medicine, social media, cybersecurity, software for modeling, physiologic monitoring, technology for managing obesity, and diagnostic tests of glycation. The journal also covers the development and use of mobile applications and wireless communication, as well as bioengineered tools such as MEMS, new biomaterials, and nanotechnology to develop new sensors. Articles in JDST cover both basic research and clinical applications of technologies being developed to help people with diabetes.