利用大型非结构化电子病历的大语言模型结构改进放疗后死亡率预测。

IF 5.3 1区医学 Q1 ONCOLOGY

Radiotherapy and Oncology Pub Date : 2025-07-19 DOI:10.1016/j.radonc.2025.111052

Sangjoon Park , Chan Woo Wee , Seo Hee Choi , Kyung Hwan Kim , Jee Suk Chang , Hong In Yoon , Ik Jae Lee , Yong Bae Kim , Jaeho Cho , Ki Chang Keum , Chang Geol Lee , Hwa Kyung Byun , Woong Sub Koom

{"title":"利用大型非结构化电子病历的大语言模型结构改进放疗后死亡率预测。","authors":"Sangjoon Park , Chan Woo Wee , Seo Hee Choi , Kyung Hwan Kim , Jee Suk Chang , Hong In Yoon , Ik Jae Lee , Yong Bae Kim , Jaeho Cho , Ki Chang Keum , Chang Geol Lee , Hwa Kyung Byun , Woong Sub Koom","doi":"10.1016/j.radonc.2025.111052","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and purpose</h3><div>Avoiding unnecessary radiotherapy (RT) in patients with limited life expectancy requires accurate selection. Traditional survival models based on structured data often lack precision. Large language models (LLMs) offer a novel approach to structuring unstructured electronic health record (EHR) data, potentially improving survival predictions by integrating comprehensive clinical information.</div></div><div><h3>Materials and methods</h3><div>We analyzed structured and unstructured data from 34,276 RT-treated patients at Yonsei Cancer Center. An open-source LLM structured unstructured EHR data using single-shot learning. External validation included 852 patients from Yongin Severance Hospital. We compared the LLM’s performance against a domain-specific medical LLM and a smaller variant. Survival prediction models using statistical, machine-learning, and deep-learning approaches incorporated both structured and LLM-structured data.</div></div><div><h3>Results</h3><div>The open-source LLM structured unstructured EHR data with 87.5 % accuracy, outperforming the domain-specific medical LLM (35.8 %). Larger LLMs were more effective in structuring clinically relevant features, such as general condition and disease extent, which correlated with survival. Incorporating LLM-structured features improved the deep learning model’s C-index from 0.737 to 0.820 (internal validation) and from 0.779 to 0.842 (external validation). Risk stratification was also enhanced, with clearer differentiation among low-, intermediate-, and high-risk groups (p < 0.001). Additionally, models became more interpretable, as key LLM-structured features aligned with statistically significant predictors traditionally identified from structured data.</div></div><div><h3>Conclusion</h3><div>General-domain LLMs, despite not being fine-tuned for medical data, can effectively structure large-scale unstructured EHRs, significantly improving survival prediction accuracy and model interpretability. The RT-Surv framework highlights the potential of LLMs to enhance clinical decision-making and optimize RT treatment.</div></div>","PeriodicalId":21041,"journal":{"name":"Radiotherapy and Oncology","volume":"211 ","pages":"Article 111052"},"PeriodicalIF":5.3000,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving mortality prediction after radiotherapy with large language model structuring of large-scale unstructured electronic health records\",\"authors\":\"Sangjoon Park , Chan Woo Wee , Seo Hee Choi , Kyung Hwan Kim , Jee Suk Chang , Hong In Yoon , Ik Jae Lee , Yong Bae Kim , Jaeho Cho , Ki Chang Keum , Chang Geol Lee , Hwa Kyung Byun , Woong Sub Koom\",\"doi\":\"10.1016/j.radonc.2025.111052\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background and purpose</h3><div>Avoiding unnecessary radiotherapy (RT) in patients with limited life expectancy requires accurate selection. Traditional survival models based on structured data often lack precision. Large language models (LLMs) offer a novel approach to structuring unstructured electronic health record (EHR) data, potentially improving survival predictions by integrating comprehensive clinical information.</div></div><div><h3>Materials and methods</h3><div>We analyzed structured and unstructured data from 34,276 RT-treated patients at Yonsei Cancer Center. An open-source LLM structured unstructured EHR data using single-shot learning. External validation included 852 patients from Yongin Severance Hospital. We compared the LLM’s performance against a domain-specific medical LLM and a smaller variant. Survival prediction models using statistical, machine-learning, and deep-learning approaches incorporated both structured and LLM-structured data.</div></div><div><h3>Results</h3><div>The open-source LLM structured unstructured EHR data with 87.5 % accuracy, outperforming the domain-specific medical LLM (35.8 %). Larger LLMs were more effective in structuring clinically relevant features, such as general condition and disease extent, which correlated with survival. Incorporating LLM-structured features improved the deep learning model’s C-index from 0.737 to 0.820 (internal validation) and from 0.779 to 0.842 (external validation). Risk stratification was also enhanced, with clearer differentiation among low-, intermediate-, and high-risk groups (p < 0.001). Additionally, models became more interpretable, as key LLM-structured features aligned with statistically significant predictors traditionally identified from structured data.</div></div><div><h3>Conclusion</h3><div>General-domain LLMs, despite not being fine-tuned for medical data, can effectively structure large-scale unstructured EHRs, significantly improving survival prediction accuracy and model interpretability. The RT-Surv framework highlights the potential of LLMs to enhance clinical decision-making and optimize RT treatment.</div></div>\",\"PeriodicalId\":21041,\"journal\":{\"name\":\"Radiotherapy and Oncology\",\"volume\":\"211 \",\"pages\":\"Article 111052\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Radiotherapy and Oncology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167814025045566\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiotherapy and Oncology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167814025045566","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ONCOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

背景与目的：避免对寿命有限的患者进行不必要的放疗（RT）需要准确的选择。传统的基于结构化数据的生存模型往往缺乏精确性。大型语言模型（llm）为结构化非结构化电子健康记录（EHR）数据提供了一种新颖的方法，通过整合全面的临床信息，有可能提高生存预测。材料和方法：我们分析了来自延世癌症中心34,276名接受rt治疗的患者的结构化和非结构化数据。一个开源的LLM结构化非结构化电子病历数据使用单次学习。外部验证包括来自龙仁Severance医院的852例患者。我们将LLM的性能与特定领域的医学LLM和较小的变体进行了比较。使用统计、机器学习和深度学习方法的生存预测模型结合了结构化和llm结构化数据。结果：开源LLM结构化非结构化EHR数据准确率为87.5 %，优于特定领域医学LLM（35.8 %）。较大的llm在构建与生存相关的临床相关特征（如一般状况和疾病程度）方面更有效。结合llm结构特征将深度学习模型的c指数从0.737提高到0.820（内部验证），从0.779提高到0.842（外部验证）。风险分层也得到加强，低、中、高风险人群之间的区分更加清晰（p ）结论：通用域llm尽管没有针对医疗数据进行微调，但可以有效地构建大规模非结构化电子病历，显著提高了生存预测的准确性和模型的可解释性。RT- surv框架强调了LLMs在提高临床决策和优化RT治疗方面的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Improving mortality prediction after radiotherapy with large language model structuring of large-scale unstructured electronic health records

查看原文本刊更多论文

Improving mortality prediction after radiotherapy with large language model structuring of large-scale unstructured electronic health records

Background and purpose

Avoiding unnecessary radiotherapy (RT) in patients with limited life expectancy requires accurate selection. Traditional survival models based on structured data often lack precision. Large language models (LLMs) offer a novel approach to structuring unstructured electronic health record (EHR) data, potentially improving survival predictions by integrating comprehensive clinical information.

Materials and methods

We analyzed structured and unstructured data from 34,276 RT-treated patients at Yonsei Cancer Center. An open-source LLM structured unstructured EHR data using single-shot learning. External validation included 852 patients from Yongin Severance Hospital. We compared the LLM’s performance against a domain-specific medical LLM and a smaller variant. Survival prediction models using statistical, machine-learning, and deep-learning approaches incorporated both structured and LLM-structured data.

Results

The open-source LLM structured unstructured EHR data with 87.5 % accuracy, outperforming the domain-specific medical LLM (35.8 %). Larger LLMs were more effective in structuring clinically relevant features, such as general condition and disease extent, which correlated with survival. Incorporating LLM-structured features improved the deep learning model’s C-index from 0.737 to 0.820 (internal validation) and from 0.779 to 0.842 (external validation). Risk stratification was also enhanced, with clearer differentiation among low-, intermediate-, and high-risk groups (p < 0.001). Additionally, models became more interpretable, as key LLM-structured features aligned with statistically significant predictors traditionally identified from structured data.

Conclusion

General-domain LLMs, despite not being fine-tuned for medical data, can effectively structure large-scale unstructured EHRs, significantly improving survival prediction accuracy and model interpretability. The RT-Surv framework highlights the potential of LLMs to enhance clinical decision-making and optimize RT treatment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Radiotherapy and Oncology 医学-核医学

CiteScore

10.30

自引率

10.50%

发文量

2445

审稿时长

45 days

期刊介绍： Radiotherapy and Oncology publishes papers describing original research as well as review articles. It covers areas of interest relating to radiation oncology. This includes: clinical radiotherapy, combined modality treatment, translational studies, epidemiological outcomes, imaging, dosimetry, and radiation therapy planning, experimental work in radiobiology, chemobiology, hyperthermia and tumour biology, as well as data science in radiation oncology and physics aspects relevant to oncology.Papers on more general aspects of interest to the radiation oncologist including chemotherapy, surgery and immunology are also published.