{"title":"AI vs academia: Experimental study on AI text detectors' accuracy in behavioral health academic writing.","authors":"Andrey A Popkov, Tyson S Barrett","doi":"10.1080/08989621.2024.2331757","DOIUrl":null,"url":null,"abstract":"<p><p>Artificial Intelligence (AI) language models continue to expand in both access and capability. As these models have evolved, the number of academic journals in medicine and healthcare which have explored policies regarding AI-generated text has increased. The implementation of such policies requires accurate AI detection tools. Inaccurate detectors risk unnecessary penalties for human authors and/or may compromise the effective enforcement of guidelines against AI-generated content. Yet, the accuracy of AI text detection tools in identifying human-written versus AI-generated content has been found to vary across published studies. This experimental study used a sample of behavioral health publications and found problematic false positive and false negative rates from both free and paid AI detection tools. The study assessed 100 research articles from 2016-2018 in behavioral health and psychiatry journals and 200 texts produced by AI chatbots (100 by \"ChatGPT\" and 100 by \"Claude\"). The free AI detector showed a median of 27.2% for the proportion of academic text identified as AI-generated, while commercial software Originality.AI demonstrated better performance but still had limitations, especially in detecting texts generated by Claude. These error rates raise doubts about relying on AI detectors to enforce strict policies around AI text generation in behavioral health publications.</p>","PeriodicalId":50927,"journal":{"name":"Accountability in Research-Policies and Quality Assurance","volume":" ","pages":"1072-1088"},"PeriodicalIF":4.0000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accountability in Research-Policies and Quality Assurance","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1080/08989621.2024.2331757","RegionNum":1,"RegionCategory":"哲学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/3/22 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MEDICAL ETHICS","Score":null,"Total":0}
引用次数: 0
Abstract
Artificial Intelligence (AI) language models continue to expand in both access and capability. As these models have evolved, the number of academic journals in medicine and healthcare which have explored policies regarding AI-generated text has increased. The implementation of such policies requires accurate AI detection tools. Inaccurate detectors risk unnecessary penalties for human authors and/or may compromise the effective enforcement of guidelines against AI-generated content. Yet, the accuracy of AI text detection tools in identifying human-written versus AI-generated content has been found to vary across published studies. This experimental study used a sample of behavioral health publications and found problematic false positive and false negative rates from both free and paid AI detection tools. The study assessed 100 research articles from 2016-2018 in behavioral health and psychiatry journals and 200 texts produced by AI chatbots (100 by "ChatGPT" and 100 by "Claude"). The free AI detector showed a median of 27.2% for the proportion of academic text identified as AI-generated, while commercial software Originality.AI demonstrated better performance but still had limitations, especially in detecting texts generated by Claude. These error rates raise doubts about relying on AI detectors to enforce strict policies around AI text generation in behavioral health publications.
期刊介绍:
Accountability in Research: Policies and Quality Assurance is devoted to the examination and critical analysis of systems for maximizing integrity in the conduct of research. It provides an interdisciplinary, international forum for the development of ethics, procedures, standards policies, and concepts to encourage the ethical conduct of research and to enhance the validity of research results.
The journal welcomes views on advancing the integrity of research in the fields of general and multidisciplinary sciences, medicine, law, economics, statistics, management studies, public policy, politics, sociology, history, psychology, philosophy, ethics, and information science.
All submitted manuscripts are subject to initial appraisal by the Editor, and if found suitable for further consideration, to peer review by independent, anonymous expert referees.