Large language models aided patient progression documentation according to the ICD standard

Q1 Medicine

Informatics in Medicine Unlocked Pub Date : 2025-01-01 DOI:10.1016/j.imu.2025.101637

Nuria Lebeña , Arantza Casillas , Alicia Pérez

{"title":"Large language models aided patient progression documentation according to the ICD standard","authors":"Nuria Lebeña , Arantza Casillas , Alicia Pérez","doi":"10.1016/j.imu.2025.101637","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and Objective</h3><div>Healthcare documentation processing is becoming more and more efficient and effective as a result of advances in machine learning and natural language processing (NLP). One challenge in clinical practice is the early detection of future patient potential diagnoses, which is crucial for preventive medicine. Estimating the potential future diagnoses, helps to speed up the management of Electronic Health Records (EHRs) and opens a path towards clinical prevention. It is a challenging task, as there are thousands of possible diseases, and, in general, there is limited data available to train systems due to privacy concerns.</div><div>The objective of his study is to infer future probable diagnoses given patients diagnosis history. In previous works, this task has been carried out using structured data, such as, ICD-coded diagnoses, overlooking unstructured textual information in EHRs. Unlike traditional methods, this study aims to enhance next-diagnosis prediction by integrating patient diagnosis information codified according to the International Classification of Diseases (ICD) with unstructured clinical text.</div></div><div><h3>Methods:</h3><div>We propose a multi-faceted model that integrates structured ICD-encoded patient histories with unstructured EHR text for future diagnosis prediction. Our approach consists of (1) a sequential model trained on structured diagnosis timelines, (2) a Clinical Longformer-based model trained on unstructured EHRs, and (3) an ensemble strategy to combine predictions from both components.</div></div><div><h3>Results:</h3><div>Our proposed ensemble strategy significantly outperforms current state-of-the-art approaches in predicting future diagnoses, achieving a Precision@5 of 72.34% and a Precision@20 of 77.49%. Additionally, it showed high robustness and reliability across different demographic groups and a varying scope of medical history.</div></div><div><h3>Conclusion:</h3><div>This research demonstrates that the integration of structured ICD diagnoses timelines with unstructured EHRs achieves improved results compared to just using structured diagnosis timelines. Notably, the proposed model also maintained high accuracy even with a short-term history of diagnoses.</div></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"55 ","pages":"Article 101637"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informatics in Medicine Unlocked","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352914825000255","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

Abstract

Background and Objective

Healthcare documentation processing is becoming more and more efficient and effective as a result of advances in machine learning and natural language processing (NLP). One challenge in clinical practice is the early detection of future patient potential diagnoses, which is crucial for preventive medicine. Estimating the potential future diagnoses, helps to speed up the management of Electronic Health Records (EHRs) and opens a path towards clinical prevention. It is a challenging task, as there are thousands of possible diseases, and, in general, there is limited data available to train systems due to privacy concerns.

The objective of his study is to infer future probable diagnoses given patients diagnosis history. In previous works, this task has been carried out using structured data, such as, ICD-coded diagnoses, overlooking unstructured textual information in EHRs. Unlike traditional methods, this study aims to enhance next-diagnosis prediction by integrating patient diagnosis information codified according to the International Classification of Diseases (ICD) with unstructured clinical text.

Methods:

We propose a multi-faceted model that integrates structured ICD-encoded patient histories with unstructured EHR text for future diagnosis prediction. Our approach consists of (1) a sequential model trained on structured diagnosis timelines, (2) a Clinical Longformer-based model trained on unstructured EHRs, and (3) an ensemble strategy to combine predictions from both components.

Results:

Our proposed ensemble strategy significantly outperforms current state-of-the-art approaches in predicting future diagnoses, achieving a Precision@5 of 72.34% and a Precision@20 of 77.49%. Additionally, it showed high robustness and reliability across different demographic groups and a varying scope of medical history.

Conclusion:

This research demonstrates that the integration of structured ICD diagnoses timelines with unstructured EHRs achieves improved results compared to just using structured diagnosis timelines. Notably, the proposed model also maintained high accuracy even with a short-term history of diagnoses.

查看原文本刊更多论文

大型语言模型根据ICD标准辅助患者进展文件

背景和目的由于机器学习和自然语言处理（NLP）的进步，医疗保健文档处理变得越来越高效。临床实践中的一个挑战是早期发现未来患者的潜在诊断，这对预防医学至关重要。评估潜在的未来诊断，有助于加快电子健康记录（EHRs）的管理，并为临床预防开辟了一条道路。这是一项具有挑战性的任务，因为有数千种可能的疾病，而且一般来说，由于隐私问题，培训系统可获得的数据有限。他的研究目的是根据患者的诊断史推断未来可能的诊断。在以前的工作中，这项任务是使用结构化数据进行的，例如，icd编码的诊断，忽略了电子病历中的非结构化文本信息。与传统方法不同，本研究旨在通过整合根据国际疾病分类（ICD）编纂的患者诊断信息和非结构化临床文本来增强下一次诊断的预测。方法：我们提出了一个多层面的模型，将结构化的icd编码的患者病史与非结构化的EHR文本集成在一起，用于未来的诊断预测。我们的方法包括(1)在结构化诊断时间表上训练的顺序模型，(2)在非结构化电子病历上训练的基于临床病历的模型，以及(3)将两个组件的预测结合起来的集成策略。结果：我们提出的集成策略在预测未来诊断方面显着优于当前最先进的方法，达到Precision@5的72.34%和Precision@20的77.49%。此外，它在不同的人口统计群体和不同范围的病史中显示出很高的稳健性和可靠性。结论：本研究表明，与仅使用结构化诊断时间表相比，将结构化ICD诊断时间表与非结构化电子病历相结合可以获得更好的结果。值得注意的是，即使有短期的诊断史，所提出的模型也保持了很高的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Informatics in Medicine Unlocked Medicine-Health Informatics

CiteScore

9.50

自引率

0.00%

发文量

282

审稿时长

39 days

期刊介绍： Informatics in Medicine Unlocked (IMU) is an international gold open access journal covering a broad spectrum of topics within medical informatics, including (but not limited to) papers focusing on imaging, pathology, teledermatology, public health, ophthalmological, nursing and translational medicine informatics. The full papers that are published in the journal are accessible to all who visit the website.