及时实施工程，提高 GPT3.5 在整形外科在职考试中的成绩

IF 2 3区医学 Q2 SURGERY

Journal of Plastic Reconstructive and Aesthetic Surgery Pub Date : 2024-09-05 DOI:10.1016/j.bjps.2024.09.001

{"title":"及时实施工程，提高 GPT3.5 在整形外科在职考试中的成绩","authors":"","doi":"10.1016/j.bjps.2024.09.001","DOIUrl":null,"url":null,"abstract":"<div><p>This study assesses ChatGPT's (GPT-3.5) performance on the 2021 ASPS Plastic Surgery In-Service Examination using prompt modifications and Retrieval Augmented Generation (RAG). ChatGPT was instructed to act as a \"resident,\" \"attending,\" or \"medical student,\" and RAG utilized a curated vector database for context. Results showed no significant improvement, with the \"resident\" prompt yielding the highest accuracy at 54%, and RAG failing to enhance performance, with accuracy remaining at 54.3%. Despite appropriate reasoning when correct, ChatGPT's overall performance fell in the 10th percentile, indicating the need for fine-tuning and more sophisticated approaches to improve AI's utility in complex medical tasks.</p></div>","PeriodicalId":50084,"journal":{"name":"Journal of Plastic Reconstructive and Aesthetic Surgery","volume":null,"pages":null},"PeriodicalIF":2.0000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prompt engineering to increase GPT3.5’s performance on the Plastic Surgery In-Service Exams\",\"authors\":\"\",\"doi\":\"10.1016/j.bjps.2024.09.001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>This study assesses ChatGPT's (GPT-3.5) performance on the 2021 ASPS Plastic Surgery In-Service Examination using prompt modifications and Retrieval Augmented Generation (RAG). ChatGPT was instructed to act as a \\\"resident,\\\" \\\"attending,\\\" or \\\"medical student,\\\" and RAG utilized a curated vector database for context. Results showed no significant improvement, with the \\\"resident\\\" prompt yielding the highest accuracy at 54%, and RAG failing to enhance performance, with accuracy remaining at 54.3%. Despite appropriate reasoning when correct, ChatGPT's overall performance fell in the 10th percentile, indicating the need for fine-tuning and more sophisticated approaches to improve AI's utility in complex medical tasks.</p></div>\",\"PeriodicalId\":50084,\"journal\":{\"name\":\"Journal of Plastic Reconstructive and Aesthetic Surgery\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2024-09-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Plastic Reconstructive and Aesthetic Surgery\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1748681524005503\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"SURGERY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Plastic Reconstructive and Aesthetic Surgery","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1748681524005503","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SURGERY","Score":null,"Total":0}

引用次数: 0

摘要

本研究评估了 ChatGPT（GPT-3.5）在 2021 年 ASPS 整形外科在职考试中使用提示修改和检索增强生成（RAG）的表现。ChatGPT 被要求扮演 "住院医师"、"主治医师 "或 "医学生"，而 RAG 则利用了一个精心策划的向量数据库作为语境。结果显示，"住院医师 "提示的准确率最高，为 54%，而 RAG 则未能提高准确率，准确率仍为 54.3%。尽管推理正确，但 ChatGPT 的整体性能仍处于第 10 百分位，这表明需要进行微调和采用更复杂的方法来提高人工智能在复杂医疗任务中的实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Prompt engineering to increase GPT3.5’s performance on the Plastic Surgery In-Service Exams

This study assesses ChatGPT's (GPT-3.5) performance on the 2021 ASPS Plastic Surgery In-Service Examination using prompt modifications and Retrieval Augmented Generation (RAG). ChatGPT was instructed to act as a "resident," "attending," or "medical student," and RAG utilized a curated vector database for context. Results showed no significant improvement, with the "resident" prompt yielding the highest accuracy at 54%, and RAG failing to enhance performance, with accuracy remaining at 54.3%. Despite appropriate reasoning when correct, ChatGPT's overall performance fell in the 10th percentile, indicating the need for fine-tuning and more sophisticated approaches to improve AI's utility in complex medical tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Plastic Reconstructive and Aesthetic Surgery 医学-外科

CiteScore

3.10

自引率

11.10%

发文量

578

审稿时长

3.5 months

期刊介绍： JPRAS An International Journal of Surgical Reconstruction is one of the world''s leading international journals, covering all the reconstructive and aesthetic aspects of plastic surgery. The journal presents the latest surgical procedures with audit and outcome studies of new and established techniques in plastic surgery including: cleft lip and palate and other heads and neck surgery, hand surgery, lower limb trauma, burns, skin cancer, breast surgery and aesthetic surgery.