Can Artificial Intelligence Deceive Residency Committees? A Randomized Multicenter Analysis of Letters of Recommendation.

IF 2.6 2区 医学 Q1 ORTHOPEDICS
Samuel K Simister, Eric G Huish, Eugene Y Tsai, Hai V Le, Andrea Halim, Dominick Tuason, John P Meehan, Holly B Leshikar, Augustine M Saiz, Zachary C Lum
{"title":"Can Artificial Intelligence Deceive Residency Committees? A Randomized Multicenter Analysis of Letters of Recommendation.","authors":"Samuel K Simister, Eric G Huish, Eugene Y Tsai, Hai V Le, Andrea Halim, Dominick Tuason, John P Meehan, Holly B Leshikar, Augustine M Saiz, Zachary C Lum","doi":"10.5435/JAAOS-D-24-00438","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>The introduction of generative artificial intelligence (AI) may have a profound effect on residency applications. In this study, we explore the abilities of AI-generated letters of recommendation (LORs) by evaluating the accuracy of orthopaedic surgery residency selection committee members to identify LORs written by human or AI authors.</p><p><strong>Methods: </strong>In a multicenter, single-blind trial, a total of 45 LORs (15 human, 15 ChatGPT, and 15 Google BARD) were curated. In a random fashion, seven faculty reviewers from four residency programs were asked to grade each of the 45 LORs based on the 11 characteristics outlined in the American Orthopaedic Associations standardized LOR, as well as a 1 to 10 scale on how they would rank the applicant, their desire of having the applicant in the program, and if they thought the letter was generated by a human or AI author. Analysis included descriptives, ordinal regression, and a receiver operator characteristic curve to compare accuracy based on the number of letters reviewed.</p><p><strong>Results: </strong>Faculty reviewers correctly identified 40% (42/105) of human-generated and 63% (132/210) of AI-generated letters ( P < 0.001), which did not increase over time (AUC 0.451, P = 0.102). When analyzed by perceived author, letters marked as human generated had significantly higher means for all variables ( P = 0.01). BARD did markedly better than human authors in accuracy (3.25 [1.79 to 5.92], P < 0.001), adaptability (1.29 [1.02 to 1.65], P = 0.034), and perceived commitment (1.56 [0.99 to 2.47], P < 0.055). Additional analysis controlling for reviewer background showed no differences in outcomes based on experience or familiarity with the AI programs.</p><p><strong>Conclusion: </strong>Faculty members were unsuccessful in determining the difference between human-generated and AI-generated LORs 50% of the time, which suggests that AI can generate LORs similarly to human authors. This highlights the importance for selection committees to reconsider the role and influence of LORs on residency applications.</p>","PeriodicalId":51098,"journal":{"name":"Journal of the American Academy of Orthopaedic Surgeons","volume":" ","pages":"e348-e355"},"PeriodicalIF":2.6000,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Academy of Orthopaedic Surgeons","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.5435/JAAOS-D-24-00438","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/12 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: The introduction of generative artificial intelligence (AI) may have a profound effect on residency applications. In this study, we explore the abilities of AI-generated letters of recommendation (LORs) by evaluating the accuracy of orthopaedic surgery residency selection committee members to identify LORs written by human or AI authors.

Methods: In a multicenter, single-blind trial, a total of 45 LORs (15 human, 15 ChatGPT, and 15 Google BARD) were curated. In a random fashion, seven faculty reviewers from four residency programs were asked to grade each of the 45 LORs based on the 11 characteristics outlined in the American Orthopaedic Associations standardized LOR, as well as a 1 to 10 scale on how they would rank the applicant, their desire of having the applicant in the program, and if they thought the letter was generated by a human or AI author. Analysis included descriptives, ordinal regression, and a receiver operator characteristic curve to compare accuracy based on the number of letters reviewed.

Results: Faculty reviewers correctly identified 40% (42/105) of human-generated and 63% (132/210) of AI-generated letters ( P < 0.001), which did not increase over time (AUC 0.451, P = 0.102). When analyzed by perceived author, letters marked as human generated had significantly higher means for all variables ( P = 0.01). BARD did markedly better than human authors in accuracy (3.25 [1.79 to 5.92], P < 0.001), adaptability (1.29 [1.02 to 1.65], P = 0.034), and perceived commitment (1.56 [0.99 to 2.47], P < 0.055). Additional analysis controlling for reviewer background showed no differences in outcomes based on experience or familiarity with the AI programs.

Conclusion: Faculty members were unsuccessful in determining the difference between human-generated and AI-generated LORs 50% of the time, which suggests that AI can generate LORs similarly to human authors. This highlights the importance for selection committees to reconsider the role and influence of LORs on residency applications.

人工智能能欺骗居民委员会吗?推荐信的随机多中心分析。
导读:生成式人工智能(AI)的引入可能会对居留申请产生深远的影响。在本研究中,我们通过评估骨科住院医师选择委员会成员识别由人类或人工智能作者撰写的推荐信的准确性,探索人工智能生成的推荐信(LORs)的能力。方法:在一项多中心单盲试验中,共筛选了45例LORs(15例人类,15例ChatGPT和15例谷歌BARD)。来自4个住院医师项目的7名教员审稿人被随机要求根据美国骨科协会(American Orthopaedic Associations)标准化LOR中列出的11个特征,以及他们对申请人的评分、他们希望申请人参加该项目的意愿,以及他们认为这封信是由人类还是人工智能作者写的,对45封LOR进行评分。分析包括描述、有序回归和接收者操作员特征曲线,以比较基于审查字母数量的准确性。结果:教师审稿人正确识别了40%(42/105)的人工生成字母和63%(132/210)的人工生成字母(P < 0.001),这一比例不随时间增加(AUC 0.451, P = 0.102)。当通过感知作者进行分析时,标记为人为生成的字母在所有变量中具有显著更高的平均值(P = 0.01)。在准确性(3.25 [1.79 ~ 5.92],P < 0.001)、适应性(1.29 [1.02 ~ 1.65],P = 0.034)和感知承诺(1.56 [0.99 ~ 2.47],P < 0.055)方面,BARD显著优于人类作者。控制审稿人背景的额外分析显示,基于经验或对人工智能程序的熟悉程度,结果没有差异。结论:教师在50%的时间内无法确定人类生成的LORs和AI生成的LORs之间的差异,这表明AI可以生成与人类作者相似的LORs。这突出表明,遴选委员会必须重新考虑实习律师对居留申请的作用和影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.10
自引率
6.20%
发文量
529
审稿时长
4-8 weeks
期刊介绍: The Journal of the American Academy of Orthopaedic Surgeons was established in the fall of 1993 by the Academy in response to its membership’s demand for a clinical review journal. Two issues were published the first year, followed by six issues yearly from 1994 through 2004. In September 2005, JAAOS began publishing monthly issues. Each issue includes richly illustrated peer-reviewed articles focused on clinical diagnosis and management. Special features in each issue provide commentary on developments in pharmacotherapeutics, materials and techniques, and computer applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信