CD-Tron: Leveraging large clinical language model for early detection of cognitive decline from electronic health records

IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Hao Guan , John Novoa-Laurentiev , Li Zhou
{"title":"CD-Tron: Leveraging large clinical language model for early detection of cognitive decline from electronic health records","authors":"Hao Guan ,&nbsp;John Novoa-Laurentiev ,&nbsp;Li Zhou","doi":"10.1016/j.jbi.2025.104830","DOIUrl":null,"url":null,"abstract":"<div><h3>Background:</h3><div>Early detection of cognitive decline during the preclinical stage of Alzheimer’s disease and related dementias (AD/ADRD) is crucial for timely intervention and treatment. Clinical notes in the electronic health record contain valuable information that can aid in the early identification of cognitive decline. In this study, we utilize advanced large clinical language models, fine-tuned on clinical notes, to improve the early detection of cognitive decline.</div></div><div><h3>Methods:</h3><div>We collected clinical notes from 2,166 patients spanning the 4 years preceding their initial mild cognitive impairment (MCI) diagnosis from the Enterprise Data Warehouse of Mass General Brigham. To train the model, we developed CD-Tron, built upon a large clinical language model that was finetuned using 4,949 expert-labeled note sections. For evaluation, the trained model was applied to 1,996 independent note sections to assess its performance on real-world unstructured clinical data. Additionally, we used explainable AI techniques, specifically SHAP values (SHapley Additive exPlanations), to interpret the model’s predictions and provide insight into the most influential features. Error analysis was also facilitated to further analyze the model’s prediction.</div></div><div><h3>Results:</h3><div>CD-Tron significantly outperforms baseline models, achieving notable improvements in precision, recall, and AUC metrics for detecting cognitive decline (CD). Tested on many real-world clinical notes, CD-Tron demonstrated high sensitivity with only one false negative, crucial for clinical applications prioritizing early and accurate CD detection. SHAP-based interpretability analysis highlighted key textual features contributing to model predictions, supporting transparency and clinician understanding.</div></div><div><h3>Conclusion:</h3><div>CD-Tron offers a novel approach to early cognitive decline detection by applying large clinical language models to free-text EHR data. Pretrained on real-world clinical notes, it accurately identifies early cognitive decline and integrates SHAP for interpretability, enhancing transparency in predictions.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104830"},"PeriodicalIF":4.0000,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1532046425000590","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Background:

Early detection of cognitive decline during the preclinical stage of Alzheimer’s disease and related dementias (AD/ADRD) is crucial for timely intervention and treatment. Clinical notes in the electronic health record contain valuable information that can aid in the early identification of cognitive decline. In this study, we utilize advanced large clinical language models, fine-tuned on clinical notes, to improve the early detection of cognitive decline.

Methods:

We collected clinical notes from 2,166 patients spanning the 4 years preceding their initial mild cognitive impairment (MCI) diagnosis from the Enterprise Data Warehouse of Mass General Brigham. To train the model, we developed CD-Tron, built upon a large clinical language model that was finetuned using 4,949 expert-labeled note sections. For evaluation, the trained model was applied to 1,996 independent note sections to assess its performance on real-world unstructured clinical data. Additionally, we used explainable AI techniques, specifically SHAP values (SHapley Additive exPlanations), to interpret the model’s predictions and provide insight into the most influential features. Error analysis was also facilitated to further analyze the model’s prediction.

Results:

CD-Tron significantly outperforms baseline models, achieving notable improvements in precision, recall, and AUC metrics for detecting cognitive decline (CD). Tested on many real-world clinical notes, CD-Tron demonstrated high sensitivity with only one false negative, crucial for clinical applications prioritizing early and accurate CD detection. SHAP-based interpretability analysis highlighted key textual features contributing to model predictions, supporting transparency and clinician understanding.

Conclusion:

CD-Tron offers a novel approach to early cognitive decline detection by applying large clinical language models to free-text EHR data. Pretrained on real-world clinical notes, it accurately identifies early cognitive decline and integrates SHAP for interpretability, enhancing transparency in predictions.

Abstract Image

CD-Tron:利用大型临床语言模型从电子健康记录中早期检测认知能力下降
背景:在阿尔茨海默病及相关痴呆(AD/ADRD)的临床前阶段早期发现认知能力下降对于及时干预和治疗至关重要。电子健康记录中的临床记录包含有价值的信息,可以帮助早期识别认知能力下降。在这项研究中,我们利用先进的大型临床语言模型,对临床笔记进行微调,以提高对认知衰退的早期检测。方法:我们从布里格姆总医院企业数据仓库中收集了2166名患者在首次轻度认知障碍(MCI)诊断前4年的临床记录。为了训练模型,我们开发了CD-Tron,建立在一个大型临床语言模型上,该模型使用4,949个专家标记的音符部分进行微调。为了评估,将训练好的模型应用于1996个独立的笔记部分,以评估其在真实世界非结构化临床数据上的表现。此外,我们使用可解释的人工智能技术,特别是SHAP值(SHapley Additive exPlanations),来解释模型的预测,并提供对最具影响力特征的见解。误差分析也便于进一步分析模型的预测结果。结果:CD- tron显着优于基线模型,在检测认知衰退(CD)的精度,召回率和AUC指标方面取得了显着改善。CD- tron在许多实际临床记录中进行了测试,显示出高灵敏度,只有一个假阴性,这对于临床应用优先考虑早期和准确的乳糜泻检测至关重要。基于shap的可解释性分析强调了有助于模型预测的关键文本特征,支持透明度和临床医生的理解。结论:CD-Tron通过将大型临床语言模型应用于自由文本电子病历数据,提供了一种早期认知衰退检测的新方法。在现实世界的临床记录上进行预训练,它能准确地识别早期认知能力下降,并将SHAP整合到可解释性中,提高预测的透明度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Biomedical Informatics
Journal of Biomedical Informatics 医学-计算机:跨学科应用
CiteScore
8.90
自引率
6.70%
发文量
243
审稿时长
32 days
期刊介绍: The Journal of Biomedical Informatics reflects a commitment to high-quality original research papers, reviews, and commentaries in the area of biomedical informatics methodology. Although we publish articles motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, and translational bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices; evaluations of implemented systems (including clinical trials of information technologies); or papers that provide insight into a biological process, a specific disease, or treatment options would generally be more suitable for publication in other venues. Papers on applications of signal processing and image analysis are often more suitable for biomedical engineering journals or other informatics journals, although we do publish papers that emphasize the information management and knowledge representation/modeling issues that arise in the storage and use of biological signals and images. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report and an effort is made to address the generalizability and/or range of application of that methodology. Note also that, given the international nature of JBI, papers that deal with specific languages other than English, or with country-specific health systems or approaches, are acceptable for JBI only if they offer generalizable lessons that are relevant to the broad JBI readership, regardless of their country, language, culture, or health system.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信