Retrospective Comparative Analysis of Prostate Cancer In-Basket Messages: Responses From Closed-Domain Large Language Models Versus Clinical Teams

Yuexing Hao MS , Jason Holmes PhD , Jared Hobson MD , Alexandra Bennett MD , Elizabeth L. McKone MD , Daniel K. Ebner MD , David M. Routman MD , Satomi Shiraishi MD , Samir H. Patel MD , Nathan Y. Yu MD , Chris L. Hallemeier MD , Brooke E. Ball MSN , Mark Waddle MD , Wei Liu PhD
{"title":"Retrospective Comparative Analysis of Prostate Cancer In-Basket Messages: Responses From Closed-Domain Large Language Models Versus Clinical Teams","authors":"Yuexing Hao MS ,&nbsp;Jason Holmes PhD ,&nbsp;Jared Hobson MD ,&nbsp;Alexandra Bennett MD ,&nbsp;Elizabeth L. McKone MD ,&nbsp;Daniel K. Ebner MD ,&nbsp;David M. Routman MD ,&nbsp;Satomi Shiraishi MD ,&nbsp;Samir H. Patel MD ,&nbsp;Nathan Y. Yu MD ,&nbsp;Chris L. Hallemeier MD ,&nbsp;Brooke E. Ball MSN ,&nbsp;Mark Waddle MD ,&nbsp;Wei Liu PhD","doi":"10.1016/j.mcpdig.2025.100198","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>To evaluate the effectiveness of RadOnc-generative pretrained transformer (GPT), a GPT-4 based large language model, in assisting with in-basket message response generation for prostate cancer treatment, with the goal of reducing the workload and time on clinical care teams while maintaining response quality.</div></div><div><h3>Patients and Methods</h3><div>RadOnc-GPT was integrated with electronic health records from both Mayo Clinic-wide databases and a radiation-oncology-specific database. The model was evaluated on 158 previously recorded in-basket message interactions, selected from 90 patients with nonmetastatic prostate cancer from the Mayo Clinic Department of Radiation Oncology in-basket message database in the calendar years 2022-2024. Quantitative natural language processing analysis and 2 grading studies, conducted by 5 clinicians and 4 nurses, were used to assess RadOnc-GPT’s responses. Three primary clinicians independently graded all messages, whereas a fourth senior clinician reviewed 41 responses with relevant discrepancies, and a fifth senior clinician evaluated 2 additional responses. The grading focused on 5 key areas: completeness, correctness, clarity, empathy, and editing time. The grading study was performed from July 20, 2024 to December 15, 2024.</div></div><div><h3>Results</h3><div>The RadOnc-GPT slightly outperformed the clinical care team in empathy, whereas achieving comparable scores with the clinical care team in completeness, correctness, and clarity. Five clinician graders identified key limitations in RadOnc-GPT’s responses, such as lack of context, insufficient domain-specific knowledge, inability to perform essential meta-tasks, and hallucination. It was estimated that RadOnc-GPT could save an average of 5.2 minutes per message for nurses and 2.4 minutes for clinicians, from reading the inquiry to sending the response.</div></div><div><h3>Conclusion</h3><div>RadOnc-GPT has the potential to considerably reduce the workload of clinical care teams by generating high-quality, timely responses for in-basket message interactions. This could lead to improved efficiency in health care workflows and reduced costs while maintaining or enhancing the quality of communication between patients and health care providers.</div></div>","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"3 1","pages":"Article 100198"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mayo Clinic Proceedings. Digital health","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949761225000057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Objective

To evaluate the effectiveness of RadOnc-generative pretrained transformer (GPT), a GPT-4 based large language model, in assisting with in-basket message response generation for prostate cancer treatment, with the goal of reducing the workload and time on clinical care teams while maintaining response quality.

Patients and Methods

RadOnc-GPT was integrated with electronic health records from both Mayo Clinic-wide databases and a radiation-oncology-specific database. The model was evaluated on 158 previously recorded in-basket message interactions, selected from 90 patients with nonmetastatic prostate cancer from the Mayo Clinic Department of Radiation Oncology in-basket message database in the calendar years 2022-2024. Quantitative natural language processing analysis and 2 grading studies, conducted by 5 clinicians and 4 nurses, were used to assess RadOnc-GPT’s responses. Three primary clinicians independently graded all messages, whereas a fourth senior clinician reviewed 41 responses with relevant discrepancies, and a fifth senior clinician evaluated 2 additional responses. The grading focused on 5 key areas: completeness, correctness, clarity, empathy, and editing time. The grading study was performed from July 20, 2024 to December 15, 2024.

Results

The RadOnc-GPT slightly outperformed the clinical care team in empathy, whereas achieving comparable scores with the clinical care team in completeness, correctness, and clarity. Five clinician graders identified key limitations in RadOnc-GPT’s responses, such as lack of context, insufficient domain-specific knowledge, inability to perform essential meta-tasks, and hallucination. It was estimated that RadOnc-GPT could save an average of 5.2 minutes per message for nurses and 2.4 minutes for clinicians, from reading the inquiry to sending the response.

Conclusion

RadOnc-GPT has the potential to considerably reduce the workload of clinical care teams by generating high-quality, timely responses for in-basket message interactions. This could lead to improved efficiency in health care workflows and reduced costs while maintaining or enhancing the quality of communication between patients and health care providers.
前列腺癌收件箱信息的回顾性比较分析:封闭大语言模型与临床团队的反应
目的评价基于GPT-4的大语言模型radonc - generated pretrained transformer (GPT)在辅助前列腺癌治疗的in-basket消息响应生成中的有效性,以减少临床护理团队的工作量和时间,同时保持响应质量。患者和方法radonc - gpt与来自梅奥诊所数据库和放射肿瘤学特定数据库的电子健康记录集成。该模型是根据之前记录的158个信息包交互进行评估的,这些信息包交互是从梅奥诊所放射肿瘤科信息包数据库中选出的90名非转移性前列腺癌患者,时间为2022-2024年。5名临床医生和4名护士进行了定量自然语言处理分析和2项评分研究,用于评估RadOnc-GPT的反应。三位主要临床医生独立地对所有信息进行评分,而第四位高级临床医生审查了41个相关差异的反馈,第五位高级临床医生评估了另外2个反馈。评分主要集中在5个关键领域:完整性、正确性、清晰度、同理心和编辑时间。分级研究时间为2024年7月20日至2024年12月15日。结果RadOnc-GPT在共情方面略优于临床护理组,而在完整性、正确性和清晰度方面与临床护理组得分相当。五名临床医生评分人员指出了RadOnc-GPT反应的主要局限性,如缺乏背景、领域特定知识不足、无法执行基本元任务和幻觉。据估计,从阅读问询到发送回复,RadOnc-GPT平均每条信息为护士节省5.2分钟,为临床医生节省2.4分钟。结论radonc - gpt通过生成高质量、及时的收件箱信息交互响应,有可能大大减少临床护理团队的工作量。这可以提高卫生保健工作流程的效率并降低成本,同时保持或提高患者与卫生保健提供者之间的沟通质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Mayo Clinic Proceedings. Digital health
Mayo Clinic Proceedings. Digital health Medicine and Dentistry (General), Health Informatics, Public Health and Health Policy
自引率
0.00%
发文量
0
审稿时长
47 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信