ChatGPT 4.0的大型语言模型处理能力,以生成分子肿瘤委员会建议-对现实世界数据的关键评估。

IF 4.2 2区 医学 Q1 ONCOLOGY
Oncologist Pub Date : 2025-09-18 DOI:10.1093/oncolo/oyaf293
Maximilian Schmutz, Sebastian Sommer, Julia Sander, David Graumann, Johannes Raffler, Iñaki Soto-Rey, Seyedmostafa Sheikhalishahi, Lisa Schmidt, Leonhard Paul Unkelbach, Levent Ortak, Tina Schaller, Sebastian Dintner, Kathrin Hildebrand, Michaela Kuhlen, Frank Jordan, Martin Trepel, Christian Hinske, Rainer Claus
{"title":"ChatGPT 4.0的大型语言模型处理能力,以生成分子肿瘤委员会建议-对现实世界数据的关键评估。","authors":"Maximilian Schmutz, Sebastian Sommer, Julia Sander, David Graumann, Johannes Raffler, Iñaki Soto-Rey, Seyedmostafa Sheikhalishahi, Lisa Schmidt, Leonhard Paul Unkelbach, Levent Ortak, Tina Schaller, Sebastian Dintner, Kathrin Hildebrand, Michaela Kuhlen, Frank Jordan, Martin Trepel, Christian Hinske, Rainer Claus","doi":"10.1093/oncolo/oyaf293","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Large language models (LLMs) like ChatGPT 4.0 hold promise for enhancing clinical decision-making in precision oncology, particularly within molecular tumor boards (MTBs). This study assesses ChatGPT 4.0's performance in generating therapy recommendations for complex real-world cancer cases compared to expert human MTB (hMTB) teams.</p><p><strong>Methods: </strong>We retrospectively analyzed 20 anonymized MTB cases from the Comprehensive Cancer Center Augsburg (CCCA), covering breast cancer (n = 3), glioblastoma (n = 3), colorectal cancer (n = 2), and rare tumors. ChatGPT 4.0 recommendations were evaluated against hMTB outputs using metrics including recommendation type (therapeutic/diagnostic), information density (IDM), consistency, quality (level of evidence [LoE]), and efficiency. Each case was prompted thrice to evaluate variability (Fleiss' Kappa).</p><p><strong>Results: </strong>ChatGPT 4.0 generated more therapeutic recommendations per case than hMTB (median 3 vs. 1, p = 0.005), with comparable diagnostic suggestions (median 1 vs. 2, p = 0.501). Therapeutic scope from ChatGPT 4.0 included off-label and clinical trial options. IDM scores indicated similar content depth between ChatGPT 4.0 (median 0.67) and hMTB (median 0.75; p = 0.084). Moderate consistency was observed across replicate runs (median Fleiss' Kappa=0.51). ChatGPT 4.0 occasionally utilized lower-level or preclinical evidence more frequently (p = 0.0019). Efficiency favored ChatGPT 4.0 significantly (median 15.2 vs. 34.7 minutes; p < 0.001).</p><p><strong>Conclusion: </strong>Incorporating ChatGPT 4.0 into MTB workflows enhances efficiency and provides relevant recommendations, especially in guideline-supported cases. However, variability in evidence prioritization highlights the need for ongoing human oversight. A hybrid approach, integrating human expertise with LLM support, may optimize precision oncology decision-making.</p>","PeriodicalId":54686,"journal":{"name":"Oncologist","volume":" ","pages":""},"PeriodicalIF":4.2000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Large language model processing capabilities of ChatGPT 4.0 to generate molecular tumor board recommendations-a critical evaluation on real world data.\",\"authors\":\"Maximilian Schmutz, Sebastian Sommer, Julia Sander, David Graumann, Johannes Raffler, Iñaki Soto-Rey, Seyedmostafa Sheikhalishahi, Lisa Schmidt, Leonhard Paul Unkelbach, Levent Ortak, Tina Schaller, Sebastian Dintner, Kathrin Hildebrand, Michaela Kuhlen, Frank Jordan, Martin Trepel, Christian Hinske, Rainer Claus\",\"doi\":\"10.1093/oncolo/oyaf293\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Large language models (LLMs) like ChatGPT 4.0 hold promise for enhancing clinical decision-making in precision oncology, particularly within molecular tumor boards (MTBs). This study assesses ChatGPT 4.0's performance in generating therapy recommendations for complex real-world cancer cases compared to expert human MTB (hMTB) teams.</p><p><strong>Methods: </strong>We retrospectively analyzed 20 anonymized MTB cases from the Comprehensive Cancer Center Augsburg (CCCA), covering breast cancer (n = 3), glioblastoma (n = 3), colorectal cancer (n = 2), and rare tumors. ChatGPT 4.0 recommendations were evaluated against hMTB outputs using metrics including recommendation type (therapeutic/diagnostic), information density (IDM), consistency, quality (level of evidence [LoE]), and efficiency. Each case was prompted thrice to evaluate variability (Fleiss' Kappa).</p><p><strong>Results: </strong>ChatGPT 4.0 generated more therapeutic recommendations per case than hMTB (median 3 vs. 1, p = 0.005), with comparable diagnostic suggestions (median 1 vs. 2, p = 0.501). Therapeutic scope from ChatGPT 4.0 included off-label and clinical trial options. IDM scores indicated similar content depth between ChatGPT 4.0 (median 0.67) and hMTB (median 0.75; p = 0.084). Moderate consistency was observed across replicate runs (median Fleiss' Kappa=0.51). ChatGPT 4.0 occasionally utilized lower-level or preclinical evidence more frequently (p = 0.0019). Efficiency favored ChatGPT 4.0 significantly (median 15.2 vs. 34.7 minutes; p < 0.001).</p><p><strong>Conclusion: </strong>Incorporating ChatGPT 4.0 into MTB workflows enhances efficiency and provides relevant recommendations, especially in guideline-supported cases. However, variability in evidence prioritization highlights the need for ongoing human oversight. A hybrid approach, integrating human expertise with LLM support, may optimize precision oncology decision-making.</p>\",\"PeriodicalId\":54686,\"journal\":{\"name\":\"Oncologist\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Oncologist\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1093/oncolo/oyaf293\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Oncologist","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/oncolo/oyaf293","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

背景:像ChatGPT 4.0这样的大型语言模型(llm)有望增强精确肿瘤学的临床决策,特别是在分子肿瘤委员会(MTBs)中。本研究评估了ChatGPT 4.0在为复杂的现实世界癌症病例提供治疗建议方面的表现,与人类MTB (hMTB)专家团队进行了比较。方法:回顾性分析来自奥格斯堡综合癌症中心(CCCA)的20例匿名MTB病例,包括乳腺癌(n = 3)、胶质母细胞瘤(n = 3)、结直肠癌(n = 2)和罕见肿瘤。使用包括推荐类型(治疗/诊断)、信息密度(IDM)、一致性、质量(证据水平[LoE])和效率在内的指标,对ChatGPT 4.0推荐进行评估。每个病例提示三次评估变异性(Fleiss’Kappa)。结果:ChatGPT 4.0比hMTB每例产生更多的治疗建议(中位数3比1,p = 0.005),具有可比的诊断建议(中位数1比2,p = 0.501)。ChatGPT 4.0的治疗范围包括标签外和临床试验选项。IDM评分显示ChatGPT 4.0(中位数0.67)和hMTB(中位数0.75;p = 0.084)之间的内容深度相似。在重复运行中观察到中度一致性(中位数Fleiss' Kappa=0.51)。ChatGPT 4.0偶尔更频繁地使用低水平或临床前证据(p = 0.0019)。结论:将ChatGPT 4.0纳入MTB工作流程可提高效率并提供相关建议,特别是在指南支持的案例中。然而,证据优先次序的变化突出了持续的人为监督的必要性。将人类专业知识与法学硕士支持相结合的混合方法可以优化精确的肿瘤学决策。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Large language model processing capabilities of ChatGPT 4.0 to generate molecular tumor board recommendations-a critical evaluation on real world data.

Background: Large language models (LLMs) like ChatGPT 4.0 hold promise for enhancing clinical decision-making in precision oncology, particularly within molecular tumor boards (MTBs). This study assesses ChatGPT 4.0's performance in generating therapy recommendations for complex real-world cancer cases compared to expert human MTB (hMTB) teams.

Methods: We retrospectively analyzed 20 anonymized MTB cases from the Comprehensive Cancer Center Augsburg (CCCA), covering breast cancer (n = 3), glioblastoma (n = 3), colorectal cancer (n = 2), and rare tumors. ChatGPT 4.0 recommendations were evaluated against hMTB outputs using metrics including recommendation type (therapeutic/diagnostic), information density (IDM), consistency, quality (level of evidence [LoE]), and efficiency. Each case was prompted thrice to evaluate variability (Fleiss' Kappa).

Results: ChatGPT 4.0 generated more therapeutic recommendations per case than hMTB (median 3 vs. 1, p = 0.005), with comparable diagnostic suggestions (median 1 vs. 2, p = 0.501). Therapeutic scope from ChatGPT 4.0 included off-label and clinical trial options. IDM scores indicated similar content depth between ChatGPT 4.0 (median 0.67) and hMTB (median 0.75; p = 0.084). Moderate consistency was observed across replicate runs (median Fleiss' Kappa=0.51). ChatGPT 4.0 occasionally utilized lower-level or preclinical evidence more frequently (p = 0.0019). Efficiency favored ChatGPT 4.0 significantly (median 15.2 vs. 34.7 minutes; p < 0.001).

Conclusion: Incorporating ChatGPT 4.0 into MTB workflows enhances efficiency and provides relevant recommendations, especially in guideline-supported cases. However, variability in evidence prioritization highlights the need for ongoing human oversight. A hybrid approach, integrating human expertise with LLM support, may optimize precision oncology decision-making.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Oncologist
Oncologist 医学-肿瘤学
CiteScore
10.40
自引率
3.40%
发文量
309
审稿时长
3-8 weeks
期刊介绍: The Oncologist® is dedicated to translating the latest research developments into the best multidimensional care for cancer patients. Thus, The Oncologist is committed to helping physicians excel in this ever-expanding environment through the publication of timely reviews, original studies, and commentaries on important developments. We believe that the practice of oncology requires both an understanding of a range of disciplines encompassing basic science related to cancer, translational research, and clinical practice, but also the socioeconomic and psychosocial factors that determine access to care and quality of life and function following cancer treatment.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信