Keenan S Fine, Emily E Zona, Aidan W O'Shea, Ellen C Shaffrey, Pradeep K Attaluri, Peter J Wirth, Aaron M Dingle, Samuel Poore
{"title":"Evaluating the Detection Accuracy of AI-Generated Content in Plastic Surgery: A Comparative Study of Medical Professionals and AI Tools.","authors":"Keenan S Fine, Emily E Zona, Aidan W O'Shea, Ellen C Shaffrey, Pradeep K Attaluri, Peter J Wirth, Aaron M Dingle, Samuel Poore","doi":"10.1097/PRS.0000000000012242","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The growing use of artificial intelligence (AI) in academic writing raises concerns about the integrity of scientific manuscripts and the ability to accurately distinguish human-written from AI-generated content. This study evaluates the ability of medical professionals and AI-detection tools to identify AI involvement in plastic surgery manuscripts.</p><p><strong>Methods: </strong>Eight manuscript passages across four topics were assessed, with four on plastic surgery. Passages were human-written, human-written with AI edits, or fully AI-generated. Twenty-four raters, including medical students, residents, and attendings, classified the passages by origin. Interrater reliability was measured using Fleiss' kappa. Human-written and AI-generated manuscripts were analyzed using three different online AI detection tools. A receiver operator curve (ROC) analysis was conducted to assess their accuracy in detecting AI-generated content. Intraclass correlation coefficients (ICC) were calculated to assess the agreement among the detection tools; these tools identify AI-generated content within the passages in terms of percentage generated by AI.</p><p><strong>Results: </strong>Raters correctly identified the origin of passages 26.5% of the time. For AI-generated passages, accuracy was 34.4%, and for human-written passages, 14.5% (p=0.012). Interrater reliability was poor (kappa=0.078). AI detection tools showed strong discriminatory power (AUC=0.962), but false-positives were frequent at optimal cutoffs (25%-50%). The intraclass correlation coefficient (ICC) between tools was low (-0.118).</p><p><strong>Conclusions: </strong>Medical professionals and AI detection tools struggle to reliably identify AI-generated content. While AI tools demonstrated high discriminatory power, they often misclassified human-written passages. These findings highlight the need for improved methods to protect the integrity of scientific writing and prevent false plagiarism claims.</p>","PeriodicalId":20128,"journal":{"name":"Plastic and reconstructive surgery","volume":" ","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plastic and reconstructive surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/PRS.0000000000012242","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: The growing use of artificial intelligence (AI) in academic writing raises concerns about the integrity of scientific manuscripts and the ability to accurately distinguish human-written from AI-generated content. This study evaluates the ability of medical professionals and AI-detection tools to identify AI involvement in plastic surgery manuscripts.
Methods: Eight manuscript passages across four topics were assessed, with four on plastic surgery. Passages were human-written, human-written with AI edits, or fully AI-generated. Twenty-four raters, including medical students, residents, and attendings, classified the passages by origin. Interrater reliability was measured using Fleiss' kappa. Human-written and AI-generated manuscripts were analyzed using three different online AI detection tools. A receiver operator curve (ROC) analysis was conducted to assess their accuracy in detecting AI-generated content. Intraclass correlation coefficients (ICC) were calculated to assess the agreement among the detection tools; these tools identify AI-generated content within the passages in terms of percentage generated by AI.
Results: Raters correctly identified the origin of passages 26.5% of the time. For AI-generated passages, accuracy was 34.4%, and for human-written passages, 14.5% (p=0.012). Interrater reliability was poor (kappa=0.078). AI detection tools showed strong discriminatory power (AUC=0.962), but false-positives were frequent at optimal cutoffs (25%-50%). The intraclass correlation coefficient (ICC) between tools was low (-0.118).
Conclusions: Medical professionals and AI detection tools struggle to reliably identify AI-generated content. While AI tools demonstrated high discriminatory power, they often misclassified human-written passages. These findings highlight the need for improved methods to protect the integrity of scientific writing and prevent false plagiarism claims.
期刊介绍:
For more than 70 years Plastic and Reconstructive Surgery® has been the one consistently excellent reference for every specialist who uses plastic surgery techniques or works in conjunction with a plastic surgeon. Plastic and Reconstructive Surgery® , the official journal of the American Society of Plastic Surgeons, is a benefit of Society membership, and is also available on a subscription basis.
Plastic and Reconstructive Surgery® brings subscribers up-to-the-minute reports on the latest techniques and follow-up for all areas of plastic and reconstructive surgery, including breast reconstruction, experimental studies, maxillofacial reconstruction, hand and microsurgery, burn repair, cosmetic surgery, as well as news on medicolegal issues. The cosmetic section provides expanded coverage on new procedures and techniques and offers more cosmetic-specific content than any other journal. All subscribers enjoy full access to the Journal''s website, which features broadcast quality videos of reconstructive and cosmetic procedures, podcasts, comprehensive article archives dating to 1946, and additional benefits offered by the newly-redesigned website.