Prompt engineering to increase GPT3.5’s performance on the Plastic Surgery In-Service Exams

IF 2 3区 医学 Q2 SURGERY
{"title":"Prompt engineering to increase GPT3.5’s performance on the Plastic Surgery In-Service Exams","authors":"","doi":"10.1016/j.bjps.2024.09.001","DOIUrl":null,"url":null,"abstract":"<div><p>This study assesses ChatGPT's (GPT-3.5) performance on the 2021 ASPS Plastic Surgery In-Service Examination using prompt modifications and Retrieval Augmented Generation (RAG). ChatGPT was instructed to act as a \"resident,\" \"attending,\" or \"medical student,\" and RAG utilized a curated vector database for context. Results showed no significant improvement, with the \"resident\" prompt yielding the highest accuracy at 54%, and RAG failing to enhance performance, with accuracy remaining at 54.3%. Despite appropriate reasoning when correct, ChatGPT's overall performance fell in the 10th percentile, indicating the need for fine-tuning and more sophisticated approaches to improve AI's utility in complex medical tasks.</p></div>","PeriodicalId":50084,"journal":{"name":"Journal of Plastic Reconstructive and Aesthetic Surgery","volume":null,"pages":null},"PeriodicalIF":2.0000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Plastic Reconstructive and Aesthetic Surgery","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1748681524005503","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0

Abstract

This study assesses ChatGPT's (GPT-3.5) performance on the 2021 ASPS Plastic Surgery In-Service Examination using prompt modifications and Retrieval Augmented Generation (RAG). ChatGPT was instructed to act as a "resident," "attending," or "medical student," and RAG utilized a curated vector database for context. Results showed no significant improvement, with the "resident" prompt yielding the highest accuracy at 54%, and RAG failing to enhance performance, with accuracy remaining at 54.3%. Despite appropriate reasoning when correct, ChatGPT's overall performance fell in the 10th percentile, indicating the need for fine-tuning and more sophisticated approaches to improve AI's utility in complex medical tasks.

及时实施工程,提高 GPT3.5 在整形外科在职考试中的成绩
本研究评估了 ChatGPT(GPT-3.5)在 2021 年 ASPS 整形外科在职考试中使用提示修改和检索增强生成(RAG)的表现。ChatGPT 被要求扮演 "住院医师"、"主治医师 "或 "医学生",而 RAG 则利用了一个精心策划的向量数据库作为语境。结果显示,"住院医师 "提示的准确率最高,为 54%,而 RAG 则未能提高准确率,准确率仍为 54.3%。尽管推理正确,但 ChatGPT 的整体性能仍处于第 10 百分位,这表明需要进行微调和采用更复杂的方法来提高人工智能在复杂医疗任务中的实用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
3.10
自引率
11.10%
发文量
578
审稿时长
3.5 months
期刊介绍: JPRAS An International Journal of Surgical Reconstruction is one of the world''s leading international journals, covering all the reconstructive and aesthetic aspects of plastic surgery. The journal presents the latest surgical procedures with audit and outcome studies of new and established techniques in plastic surgery including: cleft lip and palate and other heads and neck surgery, hand surgery, lower limb trauma, burns, skin cancer, breast surgery and aesthetic surgery.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信