Proof-of-concept study of a small language model chatbot for breast cancer decision support - a transparent, source-controlled, explainable and data-secure approach.

IF 2.7 3区 医学 Q3 ONCOLOGY
Sebastian Griewing, Fabian Lechner, Niklas Gremke, Stefan Lukac, Wolfgang Janni, Markus Wallwiener, Uwe Wagner, Martin Hirsch, Sebastian Kuhn
{"title":"Proof-of-concept study of a small language model chatbot for breast cancer decision support - a transparent, source-controlled, explainable and data-secure approach.","authors":"Sebastian Griewing, Fabian Lechner, Niklas Gremke, Stefan Lukac, Wolfgang Janni, Markus Wallwiener, Uwe Wagner, Martin Hirsch, Sebastian Kuhn","doi":"10.1007/s00432-024-05964-3","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Large language models (LLM) show potential for decision support in breast cancer care. Their use in clinical care is currently prohibited by lack of control over sources used for decision-making, explainability of the decision-making process and health data security issues. Recent development of Small Language Models (SLM) is discussed to address these challenges. This preclinical proof-of-concept study tailors an open-source SLM to the German breast cancer guideline (BC-SLM) to evaluate initial clinical accuracy and technical functionality in a preclinical simulation.</p><p><strong>Methods: </strong>A multidisciplinary tumor board (MTB) is used as the gold-standard to assess the initial clinical accuracy in terms of concordance of the BC-SLM with MTB and comparing it to two publicly available LLM, ChatGPT3.5 and 4. The study includes 20 fictional patient profiles and recommendations for 5 treatment modalities, resulting in 100 binary treatment recommendations (recommended or not recommended). Statistical evaluation includes concordance with MTB in % including Cohen's Kappa statistic (κ). Technical functionality is assessed qualitatively in terms of local hosting, adherence to the guideline and information retrieval.</p><p><strong>Results: </strong>The overall concordance amounts to 86% for BC-SLM (κ = 0.721, p < 0.001), 90% for ChatGPT4 (κ = 0.820, p < 0.001) and 83% for ChatGPT3.5 (κ = 0.661, p < 0.001). Specific concordance for each treatment modality ranges from 65 to 100% for BC-SLM, 85-100% for ChatGPT4, and 55-95% for ChatGPT3.5. The BC-SLM is locally functional, adheres to the standards of the German breast cancer guideline and provides referenced sections for its decision-making.</p><p><strong>Conclusion: </strong>The tailored BC-SLM shows initial clinical accuracy and technical functionality, with concordance to the MTB that is comparable to publicly-available LLMs like ChatGPT4 and 3.5. This serves as a proof-of-concept for adapting a SLM to an oncological disease and its guideline to address prevailing issues with LLM by ensuring decision transparency, explainability, source control, and data security, which represents a necessary step towards clinical validation and safe use of language models in clinical oncology.</p>","PeriodicalId":15118,"journal":{"name":"Journal of Cancer Research and Clinical Oncology","volume":"150 10","pages":"451"},"PeriodicalIF":2.7000,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11464535/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cancer Research and Clinical Oncology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00432-024-05964-3","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: Large language models (LLM) show potential for decision support in breast cancer care. Their use in clinical care is currently prohibited by lack of control over sources used for decision-making, explainability of the decision-making process and health data security issues. Recent development of Small Language Models (SLM) is discussed to address these challenges. This preclinical proof-of-concept study tailors an open-source SLM to the German breast cancer guideline (BC-SLM) to evaluate initial clinical accuracy and technical functionality in a preclinical simulation.

Methods: A multidisciplinary tumor board (MTB) is used as the gold-standard to assess the initial clinical accuracy in terms of concordance of the BC-SLM with MTB and comparing it to two publicly available LLM, ChatGPT3.5 and 4. The study includes 20 fictional patient profiles and recommendations for 5 treatment modalities, resulting in 100 binary treatment recommendations (recommended or not recommended). Statistical evaluation includes concordance with MTB in % including Cohen's Kappa statistic (κ). Technical functionality is assessed qualitatively in terms of local hosting, adherence to the guideline and information retrieval.

Results: The overall concordance amounts to 86% for BC-SLM (κ = 0.721, p < 0.001), 90% for ChatGPT4 (κ = 0.820, p < 0.001) and 83% for ChatGPT3.5 (κ = 0.661, p < 0.001). Specific concordance for each treatment modality ranges from 65 to 100% for BC-SLM, 85-100% for ChatGPT4, and 55-95% for ChatGPT3.5. The BC-SLM is locally functional, adheres to the standards of the German breast cancer guideline and provides referenced sections for its decision-making.

Conclusion: The tailored BC-SLM shows initial clinical accuracy and technical functionality, with concordance to the MTB that is comparable to publicly-available LLMs like ChatGPT4 and 3.5. This serves as a proof-of-concept for adapting a SLM to an oncological disease and its guideline to address prevailing issues with LLM by ensuring decision transparency, explainability, source control, and data security, which represents a necessary step towards clinical validation and safe use of language models in clinical oncology.

用于乳腺癌决策支持的小语言模型聊天机器人概念验证研究--一种透明、源控制、可解释和数据安全的方法。
目的:大型语言模型(LLM)在乳腺癌治疗决策支持方面显示出潜力。目前,由于缺乏对决策来源的控制、决策过程的可解释性以及健康数据的安全性问题,这些模型在临床护理中的使用受到了限制。本文讨论了小语言模型(SLM)的最新发展,以应对这些挑战。这项临床前概念验证研究根据德国乳腺癌指南(BC-SLM)定制了一个开源的小语言模型,以评估临床前模拟的初步临床准确性和技术功能:将多学科肿瘤委员会(MTB)作为黄金标准,从 BC-SLM 与 MTB 的一致性方面评估初始临床准确性,并将其与两款公开可用的 LLM(ChatGPT3.5 和 4)进行比较。研究包括 20 份虚构的患者资料和 5 种治疗方式的建议,最终得出 100 项二元治疗建议(建议或不建议)。统计评估包括与 MTB 的一致性(%),包括 Cohen's Kappa 统计量 (κ)。对技术功能进行了定性评估,包括本地托管、遵守指南和信息检索:结果:BC-SLM 的总体一致性达到 86%(κ = 0.721,p 结论:BC-SLM 的总体一致性达到 86%(κ = 0.721,p 结论):量身定制的 BC-SLM 显示了初步的临床准确性和技术功能,与 MTB 的一致性可与 ChatGPT4 和 3.5 等公开发布的 LLM 相媲美。这是将 SLM 适应于肿瘤疾病的概念验证,也是通过确保决策透明度、可解释性、源控制和数据安全性来解决 LLM 普遍存在的问题的指南,是实现临床验证和在临床肿瘤学中安全使用语言模型的必要步骤。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.00
自引率
2.80%
发文量
577
审稿时长
2 months
期刊介绍: The "Journal of Cancer Research and Clinical Oncology" publishes significant and up-to-date articles within the fields of experimental and clinical oncology. The journal, which is chiefly devoted to Original papers, also includes Reviews as well as Editorials and Guest editorials on current, controversial topics. The section Letters to the editors provides a forum for a rapid exchange of comments and information concerning previously published papers and topics of current interest. Meeting reports provide current information on the latest results presented at important congresses. The following fields are covered: carcinogenesis - etiology, mechanisms; molecular biology; recent developments in tumor therapy; general diagnosis; laboratory diagnosis; diagnostic and experimental pathology; oncologic surgery; and epidemiology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信