使用GPT-4将胰腺癌混合语言自由文本CT报告转换为国家综合癌症网络结构化报告模板。

IF 5.3 2区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Korean Journal of Radiology Pub Date : 2025-06-01 Epub Date: 2025-04-17 DOI:10.3348/kjr.2024.1228

Hokun Kim, Bohyun Kim, Moon Hyung Choi, Joon-Il Choi, Soon Nam Oh, Sung Eun Rha

{"title":"使用GPT-4将胰腺癌混合语言自由文本CT报告转换为国家综合癌症网络结构化报告模板。","authors":"Hokun Kim, Bohyun Kim, Moon Hyung Choi, Joon-Il Choi, Soon Nam Oh, Sung Eun Rha","doi":"10.3348/kjr.2024.1228","DOIUrl":null,"url":null,"abstract":"Objective: To evaluate the feasibility of generative pre-trained transformer-4 (GPT-4) in generating structured reports (SRs) from mixed-language (English and Korean) narrative-style CT reports for pancreatic ductal adenocarcinoma (PDAC) and to assess its accuracy in categorizing PDCA resectability.Materials and methods: This retrospective study included consecutive free-text reports of pancreas-protocol CT for staging PDAC, from two institutions, written in English or Korean from January 2021 to December 2023. Both the GPT-4 Turbo and GPT-4o models were provided prompts along with the free-text reports via an application programming interface and tasked with generating SRs and categorizing tumor resectability according to the National Comprehensive Cancer Network guidelines version 2.2024. Prompts were optimized using the GPT-4 Turbo model and 50 reports from Institution B. The performances of the GPT-4 Turbo and GPT-4o models in the two tasks were evaluated using 115 reports from Institution A. Results were compared with a reference standard that was manually derived by an abdominal radiologist. Each report was consecutively processed three times, with the most frequent response selected as the final output. Error analysis was guided by the decision rationale provided by the models.Results: Of the 115 narrative reports tested, 96 (83.5%) contained both English and Korean. For SR generation, GPT-4 Turbo and GPT-4o demonstrated comparable accuracies (92.3% [1592/1725] and 92.2% [1590/1725], respectively; P = 0.923). In the resectability categorization, GPT-4 Turbo showed higher accuracy than GPT-4o (81.7% [94/115] vs. 67.0% [77/115], respectively; P = 0.002). In the error analysis of GPT-4 Turbo, the SR generation error rate was 7.7% (133/1725 items), which was primarily attributed to inaccurate data extraction (54.1% [72/133]). The resectability categorization error rate was 18.3% (21/115), with the main cause being violation of the resectability criteria (61.9% [13/21]).Conclusion: Both GPT-4 Turbo and GPT-4o demonstrated acceptable accuracy in generating NCCN-based SRs on PDACs from mixed-language narrative reports. However, oversight by human radiologists is essential for determining resectability based on CT findings.","PeriodicalId":17881,"journal":{"name":"Korean Journal of Radiology","volume":" ","pages":"557-568"},"PeriodicalIF":5.3000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12123081/pdf/","citationCount":"0","resultStr":"{\"title\":\"Conversion of Mixed-Language Free-Text CT Reports of Pancreatic Cancer to National Comprehensive Cancer Network Structured Reporting Templates by Using GPT-4.\",\"authors\":\"Hokun Kim, Bohyun Kim, Moon Hyung Choi, Joon-Il Choi, Soon Nam Oh, Sung Eun Rha\",\"doi\":\"10.3348/kjr.2024.1228\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective: To evaluate the feasibility of generative pre-trained transformer-4 (GPT-4) in generating structured reports (SRs) from mixed-language (English and Korean) narrative-style CT reports for pancreatic ductal adenocarcinoma (PDAC) and to assess its accuracy in categorizing PDCA resectability.Materials and methods: This retrospective study included consecutive free-text reports of pancreas-protocol CT for staging PDAC, from two institutions, written in English or Korean from January 2021 to December 2023. Both the GPT-4 Turbo and GPT-4o models were provided prompts along with the free-text reports via an application programming interface and tasked with generating SRs and categorizing tumor resectability according to the National Comprehensive Cancer Network guidelines version 2.2024. Prompts were optimized using the GPT-4 Turbo model and 50 reports from Institution B. The performances of the GPT-4 Turbo and GPT-4o models in the two tasks were evaluated using 115 reports from Institution A. Results were compared with a reference standard that was manually derived by an abdominal radiologist. Each report was consecutively processed three times, with the most frequent response selected as the final output. Error analysis was guided by the decision rationale provided by the models.Results: Of the 115 narrative reports tested, 96 (83.5%) contained both English and Korean. For SR generation, GPT-4 Turbo and GPT-4o demonstrated comparable accuracies (92.3% [1592/1725] and 92.2% [1590/1725], respectively; P = 0.923). In the resectability categorization, GPT-4 Turbo showed higher accuracy than GPT-4o (81.7% [94/115] vs. 67.0% [77/115], respectively; P = 0.002). In the error analysis of GPT-4 Turbo, the SR generation error rate was 7.7% (133/1725 items), which was primarily attributed to inaccurate data extraction (54.1% [72/133]). The resectability categorization error rate was 18.3% (21/115), with the main cause being violation of the resectability criteria (61.9% [13/21]).Conclusion: Both GPT-4 Turbo and GPT-4o demonstrated acceptable accuracy in generating NCCN-based SRs on PDACs from mixed-language narrative reports. However, oversight by human radiologists is essential for determining resectability based on CT findings.\",\"PeriodicalId\":17881,\"journal\":{\"name\":\"Korean Journal of Radiology\",\"volume\":\" \",\"pages\":\"557-568\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12123081/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Korean Journal of Radiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3348/kjr.2024.1228\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/4/17 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Korean Journal of Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3348/kjr.2024.1228","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/17 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

摘要

目的：评价生成式预训练转换器-4 （GPT-4）从混合语言（英语和韩语）叙述性CT报告中生成结构化报告（SRs）的可行性，并评估其对胰腺导管腺癌（PDAC）可切除性分类的准确性。材料和方法：本回顾性研究纳入了两家机构在2021年1月至2023年12月期间用英文或韩文撰写的胰腺CT分期PDAC的连续自由文本报告。GPT-4 Turbo和gpt - 40模型都通过应用程序编程接口提供提示和自由文本报告，并根据国家综合癌症网络指南2.2024版本生成SRs和肿瘤可切除性分类。使用GPT-4 Turbo模型和来自b机构的50份报告对提示进行优化，使用来自a机构的115份报告评估GPT-4 Turbo和gpt - 40模型在两个任务中的性能，并将结果与由腹部放射科医生手动导出的参考标准进行比较。每个报告连续处理三次，选择最频繁的响应作为最终输出。误差分析以模型提供的决策原理为指导。结果：在所测试的115份叙事报告中，96份（83.5%）同时包含英语和韩语。对于SR生成，GPT-4 Turbo和gpt - 40的准确率相当，分别为92.3%[1592/1725]和92.2% [1590/1725]；P = 0.923)。在可切除性分类中，GPT-4 Turbo的准确率高于gpt - 40（分别为81.7%[94/115]和67.0% [77/115]）；P = 0.002)。在GPT-4 Turbo的错误分析中，SR生成错误率为7.7%（133/1725项），主要原因是数据提取不准确（54.1%[72/133]）。可切除性分类错误率为18.3%(21/115)，主要原因是不符合可切除性标准（61.9%[13/21]）。结论：GPT-4 Turbo和gpt - 40在混合语言叙事报告的pdac上生成基于nccn的SRs时都具有可接受的准确性。然而，人类放射科医生的监督对于根据CT结果确定可切除性至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Conversion of Mixed-Language Free-Text CT Reports of Pancreatic Cancer to National Comprehensive Cancer Network Structured Reporting Templates by Using GPT-4.

查看原文本刊更多论文

Conversion of Mixed-Language Free-Text CT Reports of Pancreatic Cancer to National Comprehensive Cancer Network Structured Reporting Templates by Using GPT-4.

Objective: To evaluate the feasibility of generative pre-trained transformer-4 (GPT-4) in generating structured reports (SRs) from mixed-language (English and Korean) narrative-style CT reports for pancreatic ductal adenocarcinoma (PDAC) and to assess its accuracy in categorizing PDCA resectability.

Materials and methods: This retrospective study included consecutive free-text reports of pancreas-protocol CT for staging PDAC, from two institutions, written in English or Korean from January 2021 to December 2023. Both the GPT-4 Turbo and GPT-4o models were provided prompts along with the free-text reports via an application programming interface and tasked with generating SRs and categorizing tumor resectability according to the National Comprehensive Cancer Network guidelines version 2.2024. Prompts were optimized using the GPT-4 Turbo model and 50 reports from Institution B. The performances of the GPT-4 Turbo and GPT-4o models in the two tasks were evaluated using 115 reports from Institution A. Results were compared with a reference standard that was manually derived by an abdominal radiologist. Each report was consecutively processed three times, with the most frequent response selected as the final output. Error analysis was guided by the decision rationale provided by the models.

Results: Of the 115 narrative reports tested, 96 (83.5%) contained both English and Korean. For SR generation, GPT-4 Turbo and GPT-4o demonstrated comparable accuracies (92.3% [1592/1725] and 92.2% [1590/1725], respectively; P = 0.923). In the resectability categorization, GPT-4 Turbo showed higher accuracy than GPT-4o (81.7% [94/115] vs. 67.0% [77/115], respectively; P = 0.002). In the error analysis of GPT-4 Turbo, the SR generation error rate was 7.7% (133/1725 items), which was primarily attributed to inaccurate data extraction (54.1% [72/133]). The resectability categorization error rate was 18.3% (21/115), with the main cause being violation of the resectability criteria (61.9% [13/21]).

Conclusion: Both GPT-4 Turbo and GPT-4o demonstrated acceptable accuracy in generating NCCN-based SRs on PDACs from mixed-language narrative reports. However, oversight by human radiologists is essential for determining resectability based on CT findings.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Korean Journal of Radiology 医学-核医学

CiteScore

10.60

自引率

12.50%

发文量

141

审稿时长

1.3 months

期刊介绍： The inaugural issue of the Korean J Radiol came out in March 2000. Our journal aims to produce and propagate knowledge on radiologic imaging and related sciences. A unique feature of the articles published in the Journal will be their reflection of global trends in radiology combined with an East-Asian perspective. Geographic differences in disease prevalence will be reflected in the contents of papers, and this will serve to enrich our body of knowledge. World''s outstanding radiologists from many countries are serving as editorial board of our journal.