Human or Machine? A Comparative Analysis of Artificial Intelligence-Generated Writing Detection in Personal Statements.

Margaret A Goodman, Anthony M Lee, Zachary Schreck, John H Hollman
{"title":"Human or Machine? A Comparative Analysis of Artificial Intelligence-Generated Writing Detection in Personal Statements.","authors":"Margaret A Goodman, Anthony M Lee, Zachary Schreck, John H Hollman","doi":"10.1097/JTE.0000000000000396","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>This study examines the ability of human readers, recurrence quantification analysis (RQA), and an online artificial intelligence (AI) detection tool (GPTZero) to distinguish between AI-generated and human-written personal statements in physical therapist education program applications.</p><p><strong>Review of literature: </strong>The emergence of large language models such as ChatGPT and Google Gemini has raised concerns about the authenticity of personal statements. Previous studies have reported varying degrees of success in detecting AI-generated text.</p><p><strong>Subjects: </strong>Data were collected from 50 randomly selected nonmatriculated individuals who applied to the Mayo Clinic School of Health Sciences Doctor of Physical Therapy Program during the 2021-2022 application cycle.</p><p><strong>Methods: </strong>Fifty personal statements from applicants were pooled with 50 Google Gemini-generated statements, then analyzed by 2 individuals, RQA, and GPTZero. RQA provided quantitative measures of lexical sophistication, whereas GPTZero used advanced machine learning algorithms to quantify AI-specific text characteristics.</p><p><strong>Results: </strong>Human raters demonstrated high agreement (κ = 0.92) and accuracy (97% and 99%). RQA parameters, particularly recurrence and max line, differentiated human- from AI-generated statements (areas under receiver operating characteristic [ROC] curve = 0.768 and 0.859, respectively). GPTZero parameters including simplicity, perplexity, and readability also differentiated human- from AI-generated statements (areas under ROC curve > 0.875).</p><p><strong>Discussion and conclusion: </strong>The study reveals that human raters, RQA, and GPTZero offer varying levels of accuracy in differentiating human-written from AI-generated personal statements. The findings could have important implications in academic admissions processes, where distinguishing between human- and AI-generated submissions is becoming increasingly important. Future research should explore integrating these methods to enhance the robustness and reliability of personal statement content evaluation across various domains. Three strategies for managing AI's role in applications-for applicants, governing organizations, and academic institutions-are provided to promote integrity and accountability in admission processes.</p>","PeriodicalId":517432,"journal":{"name":"Journal, physical therapy education","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal, physical therapy education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1097/JTE.0000000000000396","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: This study examines the ability of human readers, recurrence quantification analysis (RQA), and an online artificial intelligence (AI) detection tool (GPTZero) to distinguish between AI-generated and human-written personal statements in physical therapist education program applications.

Review of literature: The emergence of large language models such as ChatGPT and Google Gemini has raised concerns about the authenticity of personal statements. Previous studies have reported varying degrees of success in detecting AI-generated text.

Subjects: Data were collected from 50 randomly selected nonmatriculated individuals who applied to the Mayo Clinic School of Health Sciences Doctor of Physical Therapy Program during the 2021-2022 application cycle.

Methods: Fifty personal statements from applicants were pooled with 50 Google Gemini-generated statements, then analyzed by 2 individuals, RQA, and GPTZero. RQA provided quantitative measures of lexical sophistication, whereas GPTZero used advanced machine learning algorithms to quantify AI-specific text characteristics.

Results: Human raters demonstrated high agreement (κ = 0.92) and accuracy (97% and 99%). RQA parameters, particularly recurrence and max line, differentiated human- from AI-generated statements (areas under receiver operating characteristic [ROC] curve = 0.768 and 0.859, respectively). GPTZero parameters including simplicity, perplexity, and readability also differentiated human- from AI-generated statements (areas under ROC curve > 0.875).

Discussion and conclusion: The study reveals that human raters, RQA, and GPTZero offer varying levels of accuracy in differentiating human-written from AI-generated personal statements. The findings could have important implications in academic admissions processes, where distinguishing between human- and AI-generated submissions is becoming increasingly important. Future research should explore integrating these methods to enhance the robustness and reliability of personal statement content evaluation across various domains. Three strategies for managing AI's role in applications-for applicants, governing organizations, and academic institutions-are provided to promote integrity and accountability in admission processes.

人还是机器?个人陈述中人工智能生成写作检测的比较分析。
简介:本研究考察了人类读者、递归量化分析(RQA)和在线人工智能(AI)检测工具(GPTZero)在物理治疗师教育项目应用中区分人工智能生成和人工撰写的个人陈述的能力。文献综述:ChatGPT和谷歌Gemini等大型语言模型的出现引发了人们对个人陈述真实性的担忧。此前的研究报告称,在检测人工智能生成的文本方面取得了不同程度的成功。研究对象:数据收集自随机选择的50名未被录取的个人,这些人在2021-2022年申请周期内申请了梅奥诊所健康科学学院物理治疗博士项目。方法:将来自申请人的50份个人陈述与50万份gemini生成的陈述合并,然后由2个人、RQA和GPTZero进行分析。RQA提供了词汇复杂性的定量测量,而GPTZero使用先进的机器学习算法来量化人工智能特定的文本特征。结果:人类评分者表现出较高的一致性(κ = 0.92)和准确性(97%和99%)。RQA参数,特别是递归和最大线,区分了人类和人工智能生成的陈述(受试者工作特征[ROC]曲线下面积分别= 0.768和0.859)。GPTZero参数包括简单性、困惑性和可读性也区分了人工语句和人工智能生成的语句(ROC曲线下面积> 0.875)。讨论和结论:研究表明,人类评分者、RQA和GPTZero在区分人类撰写的个人陈述和人工智能生成的个人陈述方面提供了不同程度的准确性。这一发现可能对学术录取过程产生重要影响,在学术录取过程中,区分人类和人工智能生成的材料正变得越来越重要。未来的研究应探索整合这些方法,以提高跨领域个人陈述内容评估的稳健性和可靠性。为申请人、管理组织和学术机构提供了管理人工智能在申请中的作用的三种策略,以促进录取过程中的诚信和问责制。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信