ChatGPT:评估造影剂相关问题的答案,并通过提供 ESUR 造影剂指南模型进行微调

IF 1.5 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Michael Scheschenja, Moritz B. Bastian, Joel Wessendorf, Andreas D. Owczarek, Alexander M. König, Simon Viniol , Andreas H. Mahnken
{"title":"ChatGPT:评估造影剂相关问题的答案,并通过提供 ESUR 造影剂指南模型进行微调","authors":"Michael Scheschenja,&nbsp;Moritz B. Bastian,&nbsp;Joel Wessendorf,&nbsp;Andreas D. Owczarek,&nbsp;Alexander M. König,&nbsp;Simon Viniol ,&nbsp;Andreas H. Mahnken","doi":"10.1067/j.cpradiol.2024.04.005","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><p>This study aimed to assess the feasibility of GPT-4 for answering questions related to contrast media with and without the context of the European Society of Urogenital Radiology (ESUR) guideline on contrast agents. The overarching goal was to determine whether contextual enrichment by providing guideline information improves answers of GPT-4 for clinical decision-making in radiology.</p></div><div><h3>Methods</h3><p>A set of 64 questions, based on the ESUR guideline on contrast agents mirroring pertinent sections, was developed and posed to GPT-4 both directly and after providing the guideline using a plugin. Responses were graded by experienced radiologists for quality of information and accuracy in pinpointing information from the guideline as well as by radiology residents for utility, using Likert-scales.</p></div><div><h3>Results</h3><p>GPT-4′s performance improved significantly with the guideline. Without the guideline, average quality rating was 3.98, which increased to 4.33 with the guideline (p = 0036). In terms of accuracy, 82.3% of answers matched the information from the guideline. Utility scores also reflected a significant improvement with the guideline, with average scores of 4.1 (without) and 4.4 (with) (p = 0.008) with a Fleiss´ Kappa of 0.44.</p></div><div><h3>Conclusion</h3><p>GPT-4, when contextually enriched with a guideline, demonstrates enhanced capability in providing guideline-backed recommendations. This approach holds promise for real-time clinical decision-support, making guidelines more actionable. However, further refinements are necessary to maximize the potential of large language models (LLMs). Inherent limitations need to be addressed.</p></div>","PeriodicalId":51617,"journal":{"name":"Current Problems in Diagnostic Radiology","volume":"53 4","pages":"Pages 488-493"},"PeriodicalIF":1.5000,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0363018824000756/pdfft?md5=be719d0b05b27c0bc496928c92081deb&pid=1-s2.0-S0363018824000756-main.pdf","citationCount":"0","resultStr":"{\"title\":\"ChatGPT: Evaluating answers on contrast media related questions and finetuning by providing the model with the ESUR guideline on contrast agents\",\"authors\":\"Michael Scheschenja,&nbsp;Moritz B. Bastian,&nbsp;Joel Wessendorf,&nbsp;Andreas D. Owczarek,&nbsp;Alexander M. König,&nbsp;Simon Viniol ,&nbsp;Andreas H. Mahnken\",\"doi\":\"10.1067/j.cpradiol.2024.04.005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objective</h3><p>This study aimed to assess the feasibility of GPT-4 for answering questions related to contrast media with and without the context of the European Society of Urogenital Radiology (ESUR) guideline on contrast agents. The overarching goal was to determine whether contextual enrichment by providing guideline information improves answers of GPT-4 for clinical decision-making in radiology.</p></div><div><h3>Methods</h3><p>A set of 64 questions, based on the ESUR guideline on contrast agents mirroring pertinent sections, was developed and posed to GPT-4 both directly and after providing the guideline using a plugin. Responses were graded by experienced radiologists for quality of information and accuracy in pinpointing information from the guideline as well as by radiology residents for utility, using Likert-scales.</p></div><div><h3>Results</h3><p>GPT-4′s performance improved significantly with the guideline. Without the guideline, average quality rating was 3.98, which increased to 4.33 with the guideline (p = 0036). In terms of accuracy, 82.3% of answers matched the information from the guideline. Utility scores also reflected a significant improvement with the guideline, with average scores of 4.1 (without) and 4.4 (with) (p = 0.008) with a Fleiss´ Kappa of 0.44.</p></div><div><h3>Conclusion</h3><p>GPT-4, when contextually enriched with a guideline, demonstrates enhanced capability in providing guideline-backed recommendations. This approach holds promise for real-time clinical decision-support, making guidelines more actionable. However, further refinements are necessary to maximize the potential of large language models (LLMs). Inherent limitations need to be addressed.</p></div>\",\"PeriodicalId\":51617,\"journal\":{\"name\":\"Current Problems in Diagnostic Radiology\",\"volume\":\"53 4\",\"pages\":\"Pages 488-493\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2024-04-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0363018824000756/pdfft?md5=be719d0b05b27c0bc496928c92081deb&pid=1-s2.0-S0363018824000756-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Current Problems in Diagnostic Radiology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0363018824000756\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Problems in Diagnostic Radiology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0363018824000756","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

摘要

目的本研究旨在评估 GPT-4 在有欧洲泌尿放射学会(ESUR)造影剂指南的背景下和没有该指南的背景下回答造影剂相关问题的可行性。方法根据欧洲泌尿放射学会造影剂指南的相关章节,开发了一套 64 个问题,并直接向 GPT-4 提出,或在使用插件提供指南后向 GPT-4 提出。由经验丰富的放射科医生根据信息质量和准确定位指南信息的情况进行评分,并由放射科住院医生根据实用性使用李克特量表进行评分。在没有该指南的情况下,平均质量评分为 3.98,而有了该指南后,评分增至 4.33(p = 0036)。在准确性方面,82.3% 的答案与指南中的信息相符。使用指南后,效用评分也有显著提高,平均分为 4.1(无指南)和 4.4(有指南)(p = 0.008),弗莱斯 Kappa 为 0.44。这种方法有望用于实时临床决策支持,使指南更具可操作性。然而,要最大限度地发挥大型语言模型(LLMs)的潜力,还需要进一步的改进。需要解决固有的局限性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
ChatGPT: Evaluating answers on contrast media related questions and finetuning by providing the model with the ESUR guideline on contrast agents

Objective

This study aimed to assess the feasibility of GPT-4 for answering questions related to contrast media with and without the context of the European Society of Urogenital Radiology (ESUR) guideline on contrast agents. The overarching goal was to determine whether contextual enrichment by providing guideline information improves answers of GPT-4 for clinical decision-making in radiology.

Methods

A set of 64 questions, based on the ESUR guideline on contrast agents mirroring pertinent sections, was developed and posed to GPT-4 both directly and after providing the guideline using a plugin. Responses were graded by experienced radiologists for quality of information and accuracy in pinpointing information from the guideline as well as by radiology residents for utility, using Likert-scales.

Results

GPT-4′s performance improved significantly with the guideline. Without the guideline, average quality rating was 3.98, which increased to 4.33 with the guideline (p = 0036). In terms of accuracy, 82.3% of answers matched the information from the guideline. Utility scores also reflected a significant improvement with the guideline, with average scores of 4.1 (without) and 4.4 (with) (p = 0.008) with a Fleiss´ Kappa of 0.44.

Conclusion

GPT-4, when contextually enriched with a guideline, demonstrates enhanced capability in providing guideline-backed recommendations. This approach holds promise for real-time clinical decision-support, making guidelines more actionable. However, further refinements are necessary to maximize the potential of large language models (LLMs). Inherent limitations need to be addressed.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Current Problems in Diagnostic Radiology
Current Problems in Diagnostic Radiology RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-
CiteScore
3.00
自引率
0.00%
发文量
113
审稿时长
46 days
期刊介绍: Current Problems in Diagnostic Radiology covers important and controversial topics in radiology. Each issue presents important viewpoints from leading radiologists. High-quality reproductions of radiographs, CT scans, MR images, and sonograms clearly depict what is being described in each article. Also included are valuable updates relevant to other areas of practice, such as medical-legal issues or archiving systems. With new multi-topic format and image-intensive style, Current Problems in Diagnostic Radiology offers an outstanding, time-saving investigation into current topics most relevant to radiologists.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信