Can ChatGPT Fool the Match? Artificial Intelligence Personal Statements for Plastic Surgery Residency Applications: A Comparative Study

IF 0.7 4区医学 Q4 SURGERY

Plastic surgery Pub Date : 2024-07-23 DOI:10.1177/22925503241264832

Jeffrey Chen, Brendan K. Tao, Shihyun Park, Esta Bovill

{"title":"Can ChatGPT Fool the Match? Artificial Intelligence Personal Statements for Plastic Surgery Residency Applications: A Comparative Study","authors":"Jeffrey Chen, Brendan K. Tao, Shihyun Park, Esta Bovill","doi":"10.1177/22925503241264832","DOIUrl":null,"url":null,"abstract":"Introduction: Personal statements can be decisive in Canadian residency applications. With the rise in AI technology, ethical concerns regarding authenticity and originality become more pressing. This study explores the capability of ChatGPT in producing personal statements for plastic surgery residency that match the quality of statements written by successful applicants. Methods: ChatGPT was utilized to generate a cohort of personal statements for CaRMS (Canadian Residency Matching Service) to compare with previously successful Plastic Surgery applications. Each AI-generated and human-written statement was randomized and anonymized prior to assessment. Two retired members of the plastic surgery residency selection committee from the University of British Columbia, evaluated these on a 0 to 10 scale and provided a binary response judging whether each statement was AI or human written. Statistical analysis included Welch 2-sample t tests and Cohen's Kappa for agreement. Results: Twenty-two personal statements (11 AI-generated by ChatGPT and 11 human-written) were evaluated. The overall mean scores were 7.48 (SD 0.932) and 7.68 (SD 0.716), respectively, with no significant difference between AI and human groups ( P = .4129). The average accuracy in distinguishing between human and AI letters was 65.9%. The Cohen's Kappa value was 0.374. Conclusions: ChatGPT can generate personal statements for plastic surgery residency applications with quality indistinguishable from human-written counterparts, as evidenced by the lack of significant scoring difference and moderate accuracy in discrimination by experienced surgeons. These findings highlight the evolving role of AI and the need for updated evaluative criteria or guidelines in the residency application process.","PeriodicalId":20206,"journal":{"name":"Plastic surgery","volume":"40 1","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plastic surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/22925503241264832","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"SURGERY","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: Personal statements can be decisive in Canadian residency applications. With the rise in AI technology, ethical concerns regarding authenticity and originality become more pressing. This study explores the capability of ChatGPT in producing personal statements for plastic surgery residency that match the quality of statements written by successful applicants. Methods: ChatGPT was utilized to generate a cohort of personal statements for CaRMS (Canadian Residency Matching Service) to compare with previously successful Plastic Surgery applications. Each AI-generated and human-written statement was randomized and anonymized prior to assessment. Two retired members of the plastic surgery residency selection committee from the University of British Columbia, evaluated these on a 0 to 10 scale and provided a binary response judging whether each statement was AI or human written. Statistical analysis included Welch 2-sample t tests and Cohen's Kappa for agreement. Results: Twenty-two personal statements (11 AI-generated by ChatGPT and 11 human-written) were evaluated. The overall mean scores were 7.48 (SD 0.932) and 7.68 (SD 0.716), respectively, with no significant difference between AI and human groups ( P = .4129). The average accuracy in distinguishing between human and AI letters was 65.9%. The Cohen's Kappa value was 0.374. Conclusions: ChatGPT can generate personal statements for plastic surgery residency applications with quality indistinguishable from human-written counterparts, as evidenced by the lack of significant scoring difference and moderate accuracy in discrimination by experienced surgeons. These findings highlight the evolving role of AI and the need for updated evaluative criteria or guidelines in the residency application process.

查看原文本刊更多论文

ChatGPT 能否骗过匹配？整形外科住院医师申请的人工智能个人陈述：比较研究

简介：个人陈述在加拿大居留申请中起着决定性作用。随着人工智能技术的发展，有关真实性和原创性的伦理问题变得更加紧迫。本研究探讨了 ChatGPT 在制作整形外科住院医师个人陈述方面的能力，这些陈述的质量与成功申请者所写的陈述相匹配。方法：利用 ChatGPT 为 CaRMS（加拿大住院医师配对服务）生成一组个人陈述，与之前成功的整形外科申请进行比较。每份人工智能生成的声明和人工撰写的声明在评估前都进行了随机化和匿名化处理。不列颠哥伦比亚大学整形外科住院医师遴选委员会的两名退休成员以 0 到 10 的评分标准对这些陈述进行了评估，并提供了判断每份陈述是人工智能还是人工撰写的二元响应。统计分析包括韦尔奇 2 样本 t 检验和科恩 Kappa 一致度检验。结果共评估了 22 份个人陈述（11 份由 ChatGPT 人工智能生成，11 份由人工撰写）。总平均分分别为 7.48（标准差 0.932）和 7.68（标准差 0.716），人工智能组与人类组之间无显著差异 ( P = .4129)。区分人类和人工智能字母的平均准确率为 65.9%。科恩卡帕值为 0.374。结论ChatGPT 可以为整形外科住院医师申请生成个人陈述，其质量与人类撰写的陈述无异，这一点从经验丰富的外科医生的评分差异不大和中等准确率的判别中可见一斑。这些发现凸显了人工智能不断发展的作用，以及在住院医师培训申请过程中更新评估标准或指南的必要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Plastic surgery Medicine-Surgery

CiteScore

1.70

自引率

0.00%

发文量

期刊介绍： Plastic Surgery (Chirurgie Plastique) is the official journal of the Canadian Society of Plastic Surgeons, the Canadian Society for Aesthetic Plastic Surgery, Group for the Advancement of Microsurgery, and the Canadian Society for Surgery of the Hand. It serves as a major venue for Canadian research, society guidelines, and continuing medical education.