Margaret A Goodman, Anthony M Lee, Zachary Schreck, John H Hollman
{"title":"人还是机器?个人陈述中人工智能生成写作检测的比较分析。","authors":"Margaret A Goodman, Anthony M Lee, Zachary Schreck, John H Hollman","doi":"10.1097/JTE.0000000000000396","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>This study examines the ability of human readers, recurrence quantification analysis (RQA), and an online artificial intelligence (AI) detection tool (GPTZero) to distinguish between AI-generated and human-written personal statements in physical therapist education program applications.</p><p><strong>Review of literature: </strong>The emergence of large language models such as ChatGPT and Google Gemini has raised concerns about the authenticity of personal statements. Previous studies have reported varying degrees of success in detecting AI-generated text.</p><p><strong>Subjects: </strong>Data were collected from 50 randomly selected nonmatriculated individuals who applied to the Mayo Clinic School of Health Sciences Doctor of Physical Therapy Program during the 2021-2022 application cycle.</p><p><strong>Methods: </strong>Fifty personal statements from applicants were pooled with 50 Google Gemini-generated statements, then analyzed by 2 individuals, RQA, and GPTZero. RQA provided quantitative measures of lexical sophistication, whereas GPTZero used advanced machine learning algorithms to quantify AI-specific text characteristics.</p><p><strong>Results: </strong>Human raters demonstrated high agreement (κ = 0.92) and accuracy (97% and 99%). RQA parameters, particularly recurrence and max line, differentiated human- from AI-generated statements (areas under receiver operating characteristic [ROC] curve = 0.768 and 0.859, respectively). GPTZero parameters including simplicity, perplexity, and readability also differentiated human- from AI-generated statements (areas under ROC curve > 0.875).</p><p><strong>Discussion and conclusion: </strong>The study reveals that human raters, RQA, and GPTZero offer varying levels of accuracy in differentiating human-written from AI-generated personal statements. The findings could have important implications in academic admissions processes, where distinguishing between human- and AI-generated submissions is becoming increasingly important. Future research should explore integrating these methods to enhance the robustness and reliability of personal statement content evaluation across various domains. Three strategies for managing AI's role in applications-for applicants, governing organizations, and academic institutions-are provided to promote integrity and accountability in admission processes.</p>","PeriodicalId":517432,"journal":{"name":"Journal, physical therapy education","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Human or Machine? A Comparative Analysis of Artificial Intelligence-Generated Writing Detection in Personal Statements.\",\"authors\":\"Margaret A Goodman, Anthony M Lee, Zachary Schreck, John H Hollman\",\"doi\":\"10.1097/JTE.0000000000000396\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>This study examines the ability of human readers, recurrence quantification analysis (RQA), and an online artificial intelligence (AI) detection tool (GPTZero) to distinguish between AI-generated and human-written personal statements in physical therapist education program applications.</p><p><strong>Review of literature: </strong>The emergence of large language models such as ChatGPT and Google Gemini has raised concerns about the authenticity of personal statements. Previous studies have reported varying degrees of success in detecting AI-generated text.</p><p><strong>Subjects: </strong>Data were collected from 50 randomly selected nonmatriculated individuals who applied to the Mayo Clinic School of Health Sciences Doctor of Physical Therapy Program during the 2021-2022 application cycle.</p><p><strong>Methods: </strong>Fifty personal statements from applicants were pooled with 50 Google Gemini-generated statements, then analyzed by 2 individuals, RQA, and GPTZero. RQA provided quantitative measures of lexical sophistication, whereas GPTZero used advanced machine learning algorithms to quantify AI-specific text characteristics.</p><p><strong>Results: </strong>Human raters demonstrated high agreement (κ = 0.92) and accuracy (97% and 99%). RQA parameters, particularly recurrence and max line, differentiated human- from AI-generated statements (areas under receiver operating characteristic [ROC] curve = 0.768 and 0.859, respectively). GPTZero parameters including simplicity, perplexity, and readability also differentiated human- from AI-generated statements (areas under ROC curve > 0.875).</p><p><strong>Discussion and conclusion: </strong>The study reveals that human raters, RQA, and GPTZero offer varying levels of accuracy in differentiating human-written from AI-generated personal statements. The findings could have important implications in academic admissions processes, where distinguishing between human- and AI-generated submissions is becoming increasingly important. Future research should explore integrating these methods to enhance the robustness and reliability of personal statement content evaluation across various domains. Three strategies for managing AI's role in applications-for applicants, governing organizations, and academic institutions-are provided to promote integrity and accountability in admission processes.</p>\",\"PeriodicalId\":517432,\"journal\":{\"name\":\"Journal, physical therapy education\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-01-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal, physical therapy education\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1097/JTE.0000000000000396\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal, physical therapy education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1097/JTE.0000000000000396","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Human or Machine? A Comparative Analysis of Artificial Intelligence-Generated Writing Detection in Personal Statements.
Introduction: This study examines the ability of human readers, recurrence quantification analysis (RQA), and an online artificial intelligence (AI) detection tool (GPTZero) to distinguish between AI-generated and human-written personal statements in physical therapist education program applications.
Review of literature: The emergence of large language models such as ChatGPT and Google Gemini has raised concerns about the authenticity of personal statements. Previous studies have reported varying degrees of success in detecting AI-generated text.
Subjects: Data were collected from 50 randomly selected nonmatriculated individuals who applied to the Mayo Clinic School of Health Sciences Doctor of Physical Therapy Program during the 2021-2022 application cycle.
Methods: Fifty personal statements from applicants were pooled with 50 Google Gemini-generated statements, then analyzed by 2 individuals, RQA, and GPTZero. RQA provided quantitative measures of lexical sophistication, whereas GPTZero used advanced machine learning algorithms to quantify AI-specific text characteristics.
Results: Human raters demonstrated high agreement (κ = 0.92) and accuracy (97% and 99%). RQA parameters, particularly recurrence and max line, differentiated human- from AI-generated statements (areas under receiver operating characteristic [ROC] curve = 0.768 and 0.859, respectively). GPTZero parameters including simplicity, perplexity, and readability also differentiated human- from AI-generated statements (areas under ROC curve > 0.875).
Discussion and conclusion: The study reveals that human raters, RQA, and GPTZero offer varying levels of accuracy in differentiating human-written from AI-generated personal statements. The findings could have important implications in academic admissions processes, where distinguishing between human- and AI-generated submissions is becoming increasingly important. Future research should explore integrating these methods to enhance the robustness and reliability of personal statement content evaluation across various domains. Three strategies for managing AI's role in applications-for applicants, governing organizations, and academic institutions-are provided to promote integrity and accountability in admission processes.