{"title":"Prompt engineering to increase GPT3.5’s performance on the Plastic Surgery In-Service Exams","authors":"","doi":"10.1016/j.bjps.2024.09.001","DOIUrl":null,"url":null,"abstract":"<div><p>This study assesses ChatGPT's (GPT-3.5) performance on the 2021 ASPS Plastic Surgery In-Service Examination using prompt modifications and Retrieval Augmented Generation (RAG). ChatGPT was instructed to act as a \"resident,\" \"attending,\" or \"medical student,\" and RAG utilized a curated vector database for context. Results showed no significant improvement, with the \"resident\" prompt yielding the highest accuracy at 54%, and RAG failing to enhance performance, with accuracy remaining at 54.3%. Despite appropriate reasoning when correct, ChatGPT's overall performance fell in the 10th percentile, indicating the need for fine-tuning and more sophisticated approaches to improve AI's utility in complex medical tasks.</p></div>","PeriodicalId":50084,"journal":{"name":"Journal of Plastic Reconstructive and Aesthetic Surgery","volume":null,"pages":null},"PeriodicalIF":2.0000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Plastic Reconstructive and Aesthetic Surgery","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1748681524005503","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0
Abstract
This study assesses ChatGPT's (GPT-3.5) performance on the 2021 ASPS Plastic Surgery In-Service Examination using prompt modifications and Retrieval Augmented Generation (RAG). ChatGPT was instructed to act as a "resident," "attending," or "medical student," and RAG utilized a curated vector database for context. Results showed no significant improvement, with the "resident" prompt yielding the highest accuracy at 54%, and RAG failing to enhance performance, with accuracy remaining at 54.3%. Despite appropriate reasoning when correct, ChatGPT's overall performance fell in the 10th percentile, indicating the need for fine-tuning and more sophisticated approaches to improve AI's utility in complex medical tasks.
期刊介绍:
JPRAS An International Journal of Surgical Reconstruction is one of the world''s leading international journals, covering all the reconstructive and aesthetic aspects of plastic surgery.
The journal presents the latest surgical procedures with audit and outcome studies of new and established techniques in plastic surgery including: cleft lip and palate and other heads and neck surgery, hand surgery, lower limb trauma, burns, skin cancer, breast surgery and aesthetic surgery.