从自由文本临床记录中提取乳腺癌治疗路径的开源混合大语言模型集成系统。

IF 2.8 Q2 ONCOLOGY

JCO Clinical Cancer Informatics Pub Date : 2025-06-01 Epub Date: 2025-06-27 DOI:10.1200/CCI-25-00002

Amara Tariq, Madhu Sikha, Allison W Kurian, Kevin Ward, Theresa H M Keegan, Daniel L Rubin, Imon Banerjee

{"title":"从自由文本临床记录中提取乳腺癌治疗路径的开源混合大语言模型集成系统。","authors":"Amara Tariq, Madhu Sikha, Allison W Kurian, Kevin Ward, Theresa H M Keegan, Daniel L Rubin, Imon Banerjee","doi":"10.1200/CCI-25-00002","DOIUrl":null,"url":null,"abstract":"Purpose: Automated curation of breast cancer treatment data with minimal human involvement could accelerate the collection of statewide and nationwide evidence for patient management and assessing the effectiveness of treatment pathways. The primary challenges are the complexity and inconsistency of structured clinical data streams and accurate extraction of this information from free-text clinical narratives.Materials and methods: We proposed a hybrid two-phase information extraction framework that combined a Unified Medical Language System parser (phase-1) with a fine-tuned large language model (LLM; phase-2) to extract longitudinal treatment timelines from time-stamped clinical notes. Our framework was developed through end-to-end joint learning as a question-answering model, where the model was trained to simultaneously answer five questions, each corresponding to a specific treatment.Results: We fine-tuned and internally validated the model on 26,692 patients with breast cancer (diagnosed between 2013 and 2020) receiving treatment at Mayo Clinic and externally validated the model on 162 randomly selected patients from Stanford Healthcare. Zero-shot LLM (out-of-the-box) had high specificity but low sensitivity, indicating that although these frameworks are useful for generic language understanding, they are lacking in terms of targeted clinical tasks. The proposed model achieved 0.942 average AUROC on the internal and 0.924 on the external data, demonstrating only marginal drop in performance when evaluated on external. The proposed model also achieved better trade-off between sensitivity (average: 79.2%) and specificity (average: 76.2%) compared with rule-based (average sensitivity: 70.5%, average specificity: 68.1%) and structured codes (average sensitivity: 64.1%, average specificity: 83.5%).Conclusion: The proposed framework can extract temporal information about cancer treatments from various time-stamped clinic notes, regardless of the setting of treatment administration (inpatient or outpatient) or time frame. To support the cancer research community for such data curation and longitudinal analysis, we have packaged the code as a docker image, which needs minimal system reconfiguration and shared with an open-source academic license.","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500002"},"PeriodicalIF":2.8000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12208650/pdf/","citationCount":"0","resultStr":"{\"title\":\"Open-Source Hybrid Large Language Model Integrated System for Extraction of Breast Cancer Treatment Pathway From Free-Text Clinical Notes.\",\"authors\":\"Amara Tariq, Madhu Sikha, Allison W Kurian, Kevin Ward, Theresa H M Keegan, Daniel L Rubin, Imon Banerjee\",\"doi\":\"10.1200/CCI-25-00002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose: Automated curation of breast cancer treatment data with minimal human involvement could accelerate the collection of statewide and nationwide evidence for patient management and assessing the effectiveness of treatment pathways. The primary challenges are the complexity and inconsistency of structured clinical data streams and accurate extraction of this information from free-text clinical narratives.Materials and methods: We proposed a hybrid two-phase information extraction framework that combined a Unified Medical Language System parser (phase-1) with a fine-tuned large language model (LLM; phase-2) to extract longitudinal treatment timelines from time-stamped clinical notes. Our framework was developed through end-to-end joint learning as a question-answering model, where the model was trained to simultaneously answer five questions, each corresponding to a specific treatment.Results: We fine-tuned and internally validated the model on 26,692 patients with breast cancer (diagnosed between 2013 and 2020) receiving treatment at Mayo Clinic and externally validated the model on 162 randomly selected patients from Stanford Healthcare. Zero-shot LLM (out-of-the-box) had high specificity but low sensitivity, indicating that although these frameworks are useful for generic language understanding, they are lacking in terms of targeted clinical tasks. The proposed model achieved 0.942 average AUROC on the internal and 0.924 on the external data, demonstrating only marginal drop in performance when evaluated on external. The proposed model also achieved better trade-off between sensitivity (average: 79.2%) and specificity (average: 76.2%) compared with rule-based (average sensitivity: 70.5%, average specificity: 68.1%) and structured codes (average sensitivity: 64.1%, average specificity: 83.5%).Conclusion: The proposed framework can extract temporal information about cancer treatments from various time-stamped clinic notes, regardless of the setting of treatment administration (inpatient or outpatient) or time frame. To support the cancer research community for such data curation and longitudinal analysis, we have packaged the code as a docker image, which needs minimal system reconfiguration and shared with an open-source academic license.\",\"PeriodicalId\":51626,\"journal\":{\"name\":\"JCO Clinical Cancer Informatics\",\"volume\":\"9 \",\"pages\":\"e2500002\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12208650/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JCO Clinical Cancer Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1200/CCI-25-00002\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/6/27 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JCO Clinical Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1200/CCI-25-00002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/27 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

目的：人工参与最少的乳腺癌治疗数据的自动化管理可以加速全州和全国范围内患者管理和评估治疗途径有效性的证据的收集。主要的挑战是结构化临床数据流的复杂性和不一致性，以及从自由文本临床叙述中准确提取这些信息。材料和方法：我们提出了一种混合的两阶段信息提取框架，该框架结合了统一医学语言系统解析器（第一阶段）和微调大语言模型(LLM；阶段2)从带时间戳的临床记录中提取纵向治疗时间表。我们的框架是通过端到端联合学习作为问答模型开发的，该模型被训练为同时回答五个问题，每个问题对应一个特定的处理。结果：我们对在梅奥诊所接受治疗的26,692名乳腺癌患者（2013年至2020年诊断）进行了微调和内部验证，并对从斯坦福医疗中心随机选择的162名患者进行了外部验证。Zero-shot LLM（开箱即用）具有高特异性但低敏感性，这表明尽管这些框架对通用语言理解有用，但它们在有针对性的临床任务方面缺乏。本文提出的模型在内部数据上的平均AUROC为0.942，在外部数据上的平均AUROC为0.924，在外部数据上评估时，性能仅略有下降。与基于规则（平均敏感性70.5%，平均特异性68.1%）和结构化代码（平均敏感性64.1%，平均特异性83.5%）相比，该模型在敏感性（平均79.2%）和特异性（平均76.2%）之间取得了更好的平衡。结论：所提出的框架可以从各种带有时间戳的临床记录中提取癌症治疗的时间信息，而不考虑治疗管理的设置（住院或门诊）或时间框架。为了支持癌症研究社区进行此类数据管理和纵向分析，我们将代码打包为docker映像，这需要最小的系统重构，并与开源学术许可共享。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Open-Source Hybrid Large Language Model Integrated System for Extraction of Breast Cancer Treatment Pathway From Free-Text Clinical Notes.

Purpose: Automated curation of breast cancer treatment data with minimal human involvement could accelerate the collection of statewide and nationwide evidence for patient management and assessing the effectiveness of treatment pathways. The primary challenges are the complexity and inconsistency of structured clinical data streams and accurate extraction of this information from free-text clinical narratives.

Materials and methods: We proposed a hybrid two-phase information extraction framework that combined a Unified Medical Language System parser (phase-1) with a fine-tuned large language model (LLM; phase-2) to extract longitudinal treatment timelines from time-stamped clinical notes. Our framework was developed through end-to-end joint learning as a question-answering model, where the model was trained to simultaneously answer five questions, each corresponding to a specific treatment.

Results: We fine-tuned and internally validated the model on 26,692 patients with breast cancer (diagnosed between 2013 and 2020) receiving treatment at Mayo Clinic and externally validated the model on 162 randomly selected patients from Stanford Healthcare. Zero-shot LLM (out-of-the-box) had high specificity but low sensitivity, indicating that although these frameworks are useful for generic language understanding, they are lacking in terms of targeted clinical tasks. The proposed model achieved 0.942 average AUROC on the internal and 0.924 on the external data, demonstrating only marginal drop in performance when evaluated on external. The proposed model also achieved better trade-off between sensitivity (average: 79.2%) and specificity (average: 76.2%) compared with rule-based (average sensitivity: 70.5%, average specificity: 68.1%) and structured codes (average sensitivity: 64.1%, average specificity: 83.5%).

Conclusion: The proposed framework can extract temporal information about cancer treatments from various time-stamped clinic notes, regardless of the setting of treatment administration (inpatient or outpatient) or time frame. To support the cancer research community for such data curation and longitudinal analysis, we have packaged the code as a docker image, which needs minimal system reconfiguration and shared with an open-source academic license.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

JCO Clinical Cancer Informatics ONCOLOGY-

CiteScore

6.20

自引率

4.80%

发文量

190