William J Karakash, Henry Avetisian, Jonathan M Ragheb, Jeffrey C Wang, Raymond J Hah, Ram K Alluri
{"title":"Artificial Intelligence vs Human Authorship in Spine Surgery Fellowship Personal Statements: Can ChatGPT Outperform Applicants?","authors":"William J Karakash, Henry Avetisian, Jonathan M Ragheb, Jeffrey C Wang, Raymond J Hah, Ram K Alluri","doi":"10.1177/21925682251344248","DOIUrl":null,"url":null,"abstract":"<p><p>Study DesignA comparative analysis of AI-generated vs human-authored personal statements for spine surgery fellowship applications.ObjectiveTo assess whether evaluators could differentiate between ChatGPT- and human-authored personal statements and determine if AI-generated statements could outperform human-authored ones in quality metrics.Summary of Background DataPersonal statements are key in fellowship admissions, but the rise of AI tools like ChatGPT raises concerns about their use. While previous studies have examined AI-generated residency statements, their role in spine fellowship applications remains unexplored.MethodsNine personal statements (4 ChatGPT-generated, 5 human-authored) were evaluated by 8 blinded reviewers (6 attending spine surgeons and 2 fellows). ChatGPT-4o was prompted to create statements focused on 4 unique experiences. Evaluators rated each for readability, originality, quality, and authenticity (0-100 scale), determined AI authorship, and indicated interview recommendations.ResultsChatGPT-authored statements scored higher in readability (65.69 vs 56.40, <i>P</i> = 0.016) and quality (63.00 vs 51.80, <i>P</i> = 0.004) but showed no differences in originality (<i>P</i> = 0.339) or authenticity (<i>P</i> = 0.256). Reviewers could not reliably distinguish AI from human authorship (<i>P</i> = 1.000). Interview recommendations favored ChatGPT-generated statements (84.4% vs 62.5%, OR: 3.24 [1.08-11.17], <i>P</i> = 0.045).ConclusionChatGPT can produce high quality, indistinguishable spine fellowship personal statements that increase interview likelihood. These findings highlight the need for nuanced guidelines regarding AI use in application processes, particularly considering its potential role in expanding access to high-quality writing assistance and editing.</p>","PeriodicalId":12680,"journal":{"name":"Global Spine Journal","volume":" ","pages":"21925682251344248"},"PeriodicalIF":2.6000,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12092409/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Global Spine Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/21925682251344248","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Study DesignA comparative analysis of AI-generated vs human-authored personal statements for spine surgery fellowship applications.ObjectiveTo assess whether evaluators could differentiate between ChatGPT- and human-authored personal statements and determine if AI-generated statements could outperform human-authored ones in quality metrics.Summary of Background DataPersonal statements are key in fellowship admissions, but the rise of AI tools like ChatGPT raises concerns about their use. While previous studies have examined AI-generated residency statements, their role in spine fellowship applications remains unexplored.MethodsNine personal statements (4 ChatGPT-generated, 5 human-authored) were evaluated by 8 blinded reviewers (6 attending spine surgeons and 2 fellows). ChatGPT-4o was prompted to create statements focused on 4 unique experiences. Evaluators rated each for readability, originality, quality, and authenticity (0-100 scale), determined AI authorship, and indicated interview recommendations.ResultsChatGPT-authored statements scored higher in readability (65.69 vs 56.40, P = 0.016) and quality (63.00 vs 51.80, P = 0.004) but showed no differences in originality (P = 0.339) or authenticity (P = 0.256). Reviewers could not reliably distinguish AI from human authorship (P = 1.000). Interview recommendations favored ChatGPT-generated statements (84.4% vs 62.5%, OR: 3.24 [1.08-11.17], P = 0.045).ConclusionChatGPT can produce high quality, indistinguishable spine fellowship personal statements that increase interview likelihood. These findings highlight the need for nuanced guidelines regarding AI use in application processes, particularly considering its potential role in expanding access to high-quality writing assistance and editing.
研究设计:人工智能生成与人类撰写的脊柱外科奖学金申请个人陈述的比较分析。目的评估评估者是否能够区分ChatGPT和人工撰写的个人陈述,并确定人工智能生成的陈述在质量指标上是否优于人工撰写的陈述。个人陈述是奖学金录取的关键,但ChatGPT等人工智能工具的兴起引发了人们对其使用的担忧。虽然之前的研究已经检查了人工智能生成的住院医师陈述,但它们在脊柱奖学金申请中的作用仍未被探索。方法由8位盲法评价者(6位脊柱外科主治医师和2位研究员)对9份个人陈述(4份由chatgpt生成,5份由人撰写)进行评价。chatgpt - 40被要求针对4种独特的体验进行陈述。评估人员对每一篇文章的可读性、原创性、质量和真实性(0-100分)进行了评分,确定了人工智能的作者身份,并提出了面试建议。结果在可读性(65.69 vs 56.40, P = 0.016)和质量(63.00 vs 51.80, P = 0.004)方面,schatgpt撰写语句得分较高,但在原创性(P = 0.339)和真实性(P = 0.256)方面差异不显著。审稿人无法可靠地区分人工智能和人类作者(P = 1.000)。面试推荐倾向于chatgpt生成的陈述(84.4% vs 62.5%, OR: 3.24 [1.08-11.17], P = 0.045)。结论chatgpt可以产生高质量、难以区分的脊柱研究员个人陈述,增加面试的可能性。这些发现强调了在申请过程中使用人工智能的微妙指导方针的必要性,特别是考虑到它在扩大获得高质量写作辅助和编辑方面的潜在作用。
期刊介绍:
Global Spine Journal (GSJ) is the official scientific publication of AOSpine. A peer-reviewed, open access journal, devoted to the study and treatment of spinal disorders, including diagnosis, operative and non-operative treatment options, surgical techniques, and emerging research and clinical developments.GSJ is indexed in PubMedCentral, SCOPUS, and Emerging Sources Citation Index (ESCI).