Automated Resectability Classification of Pancreatic Cancer CT Reports with Privacy-Preserving Open-Weight Large Language Models: A Multicenter Study.

IF 5.7 3区医学 Q1 HEALTH CARE SCIENCES & SERVICES

Journal of Medical Systems Pub Date : 2025-09-24 DOI:10.1007/s10916-025-02248-2

Jeong Hyun Lee, Ji Hye Min, Kyowon Gu, Seungchul Han, Jeong Ah Hwang, Seo-Youn Choi, Kyoung Doo Song, Jeong Eun Lee, Jisun Lee, Ji Eun Moon, Hasmik Adetyan, Ju Dong Yang

{"title":"Automated Resectability Classification of Pancreatic Cancer CT Reports with Privacy-Preserving Open-Weight Large Language Models: A Multicenter Study.","authors":"Jeong Hyun Lee, Ji Hye Min, Kyowon Gu, Seungchul Han, Jeong Ah Hwang, Seo-Youn Choi, Kyoung Doo Song, Jeong Eun Lee, Jisun Lee, Ji Eun Moon, Hasmik Adetyan, Ju Dong Yang","doi":"10.1007/s10916-025-02248-2","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong> To evaluate the effectiveness of open-weight large language models (LLMs) in extracting key radiological features and determining National Comprehensive Cancer Network (NCCN) resectability status from free-text radiology reports for pancreatic ductal adenocarcinoma (PDAC). Methods. Prompts were developed using 30 fictitious reports, internally validated on 100 additional fictitious reports, and tested using 200 real reports from two institutions (January 2022 to December 2023). Two radiologists established ground truth for 18 key features and resectability status. Gemma-2-27b-it and Llama-3-70b-instruct models were evaluated using recall, precision, F1-score, extraction accuracy, and overall resectability accuracy. Statistical analyses included McNemar's test and mixed-effects logistic regression. Results. In internal validation, Llama had significantly higher recall than Gemma (99% vs. 95%, p < 0.01) and slightly higher extraction accuracy (98% vs. 97%). Llama also demonstrated higher overall resectability accuracy (93% vs. 91%). In the internal test set, both models achieved 96% recall and 96% extraction accuracy. Overall resectability accuracy was 95% for Llama and 93% for Gemma. In the external test set, both models had 93% recall. Extraction accuracy was 93% for Llama and 95% for Gemma. Gemma achieved higher overall resectability accuracy (89% vs. 83%), but the difference was not statistically significant (p > 0.05). Conclusion. Open-weight models accurately extracted key radiological features and determined NCCN resectability status from free-text PDAC reports. While internal dataset performance was robust, performance on external data decreased, highlighting the need for institution-specific optimization.</p>","PeriodicalId":16338,"journal":{"name":"Journal of Medical Systems","volume":"49 1","pages":"118"},"PeriodicalIF":5.7000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Systems","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s10916-025-02248-2","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: To evaluate the effectiveness of open-weight large language models (LLMs) in extracting key radiological features and determining National Comprehensive Cancer Network (NCCN) resectability status from free-text radiology reports for pancreatic ductal adenocarcinoma (PDAC). Methods. Prompts were developed using 30 fictitious reports, internally validated on 100 additional fictitious reports, and tested using 200 real reports from two institutions (January 2022 to December 2023). Two radiologists established ground truth for 18 key features and resectability status. Gemma-2-27b-it and Llama-3-70b-instruct models were evaluated using recall, precision, F1-score, extraction accuracy, and overall resectability accuracy. Statistical analyses included McNemar's test and mixed-effects logistic regression. Results. In internal validation, Llama had significantly higher recall than Gemma (99% vs. 95%, p < 0.01) and slightly higher extraction accuracy (98% vs. 97%). Llama also demonstrated higher overall resectability accuracy (93% vs. 91%). In the internal test set, both models achieved 96% recall and 96% extraction accuracy. Overall resectability accuracy was 95% for Llama and 93% for Gemma. In the external test set, both models had 93% recall. Extraction accuracy was 93% for Llama and 95% for Gemma. Gemma achieved higher overall resectability accuracy (89% vs. 83%), but the difference was not statistically significant (p > 0.05). Conclusion. Open-weight models accurately extracted key radiological features and determined NCCN resectability status from free-text PDAC reports. While internal dataset performance was robust, performance on external data decreased, highlighting the need for institution-specific optimization.

查看原文本刊更多论文

基于隐私保护开放权重大语言模型的胰腺癌CT报告可切除性自动分类：一项多中心研究。

目的：评估开放重量大语言模型（LLMs）在从自由文本胰腺导管腺癌（PDAC）放射学报告中提取关键放射特征和确定国家综合癌症网络（NCCN）可切除状态方面的有效性。方法。使用30个虚构报告开发提示，对另外100个虚构报告进行内部验证，并使用来自两个机构（2022年1月至2023年12月）的200个真实报告进行测试。两名放射科医生建立了18个关键特征和可切除状态的基本事实。对gma -2-27b-it和llama -3-70b- directive模型进行召回率、精密度、f1评分、提取准确度和总体可切除准确度评估。统计分析包括McNemar检验和混合效应logistic回归。结果。在内部验证中，Llama的召回率显著高于Gemma（99%比95%，p 0.05）。结论。开重模型准确地提取了关键的放射学特征，并从自由文本PDAC报告中确定了NCCN的可切除状态。虽然内部数据集的性能是稳健的，但外部数据的性能下降，突出了机构特定优化的必要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Medical Systems 医学-卫生保健

CiteScore

11.60

自引率

1.90%

发文量

审稿时长

4.8 months

期刊介绍： Journal of Medical Systems provides a forum for the presentation and discussion of the increasingly extensive applications of new systems techniques and methods in hospital clinic and physician''s office administration; pathology radiology and pharmaceutical delivery systems; medical records storage and retrieval; and ancillary patient-support systems. The journal publishes informative articles essays and studies across the entire scale of medical systems from large hospital programs to novel small-scale medical services. Education is an integral part of this amalgamation of sciences and selected articles are published in this area. Since existing medical systems are constantly being modified to fit particular circumstances and to solve specific problems the journal includes a special section devoted to status reports on current installations.