{"title":"大型语言模型内容检测器:人类编辑和ChatGPT模仿个人风格的影响-初步研究。","authors":"Shigeki Matsubara","doi":"10.1177/10815589251352572","DOIUrl":null,"url":null,"abstract":"<p><p>The use of large language model (LLM), such as ChatGPT, in academic writing is increasing, but the extent to which LLM-generated content can evade detection remains unclear. This descriptive pilot study investigates whether LLM-generated abstracts, edited by humans or LLM trained to mimic a specific writing style, can escape LLM detectors. Using a previously published original article, ChatGPT-4 generated an abstract (Abstract 1). This abstract underwent three modifications: context-based human editing (Abstract 2), stylistic human editing (Abstract 3), and ChatGPT editing incorporating the author's writing style (Abstract 4). The genuine human-written abstract from the original article served as Abstract 5. Five freely available LLM detectors analyzed these abstracts, providing LLM-generated probability scores. The genuinely LLM-generated manuscript (Abstract 1) was judged as LLM-generated with 82%-100% (median: 100%) probability. The genuinely human-written manuscript (Abstract 5) was judged as human-written with the LLM-generated probability of 0%-13% (median: 0%). Human-edited abstracts (Abstracts 2 and 3) exhibited a decreasing LLM-generated probability 4%-71% (median: 64%) and 2%-65% (median: 61%), respectively, but varied widely among detectors. The LLM-mimicked abstract (Abstract 4) was classified as LLM-generated, with LLM-generated probability ranging 82%-100% (median: 100%). The results showed variations across different LLM detectors. Supplementary experiments demonstrated a similar trend. Human editing reduces LLM-detection probabilities but does not guarantee evasion. LLM-generated content mimicking a specific writing style remains largely detectable. This preliminary experiment provided a novel study concept. Further studies on various manuscripts and different LLM detection methods will enhance understanding of the relationship between LLM-aided paper writing and LLM detectors.</p>","PeriodicalId":520677,"journal":{"name":"Journal of investigative medicine : the official publication of the American Federation for Clinical Research","volume":" ","pages":"10815589251352572"},"PeriodicalIF":2.0000,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Large language model content detectors: Effects of human editing and ChatGPT mimicking individual style-A preliminary study.\",\"authors\":\"Shigeki Matsubara\",\"doi\":\"10.1177/10815589251352572\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The use of large language model (LLM), such as ChatGPT, in academic writing is increasing, but the extent to which LLM-generated content can evade detection remains unclear. This descriptive pilot study investigates whether LLM-generated abstracts, edited by humans or LLM trained to mimic a specific writing style, can escape LLM detectors. Using a previously published original article, ChatGPT-4 generated an abstract (Abstract 1). This abstract underwent three modifications: context-based human editing (Abstract 2), stylistic human editing (Abstract 3), and ChatGPT editing incorporating the author's writing style (Abstract 4). The genuine human-written abstract from the original article served as Abstract 5. Five freely available LLM detectors analyzed these abstracts, providing LLM-generated probability scores. The genuinely LLM-generated manuscript (Abstract 1) was judged as LLM-generated with 82%-100% (median: 100%) probability. The genuinely human-written manuscript (Abstract 5) was judged as human-written with the LLM-generated probability of 0%-13% (median: 0%). Human-edited abstracts (Abstracts 2 and 3) exhibited a decreasing LLM-generated probability 4%-71% (median: 64%) and 2%-65% (median: 61%), respectively, but varied widely among detectors. The LLM-mimicked abstract (Abstract 4) was classified as LLM-generated, with LLM-generated probability ranging 82%-100% (median: 100%). The results showed variations across different LLM detectors. Supplementary experiments demonstrated a similar trend. Human editing reduces LLM-detection probabilities but does not guarantee evasion. LLM-generated content mimicking a specific writing style remains largely detectable. This preliminary experiment provided a novel study concept. Further studies on various manuscripts and different LLM detection methods will enhance understanding of the relationship between LLM-aided paper writing and LLM detectors.</p>\",\"PeriodicalId\":520677,\"journal\":{\"name\":\"Journal of investigative medicine : the official publication of the American Federation for Clinical Research\",\"volume\":\" \",\"pages\":\"10815589251352572\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-06-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of investigative medicine : the official publication of the American Federation for Clinical Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/10815589251352572\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of investigative medicine : the official publication of the American Federation for Clinical Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/10815589251352572","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Large language model content detectors: Effects of human editing and ChatGPT mimicking individual style-A preliminary study.
The use of large language model (LLM), such as ChatGPT, in academic writing is increasing, but the extent to which LLM-generated content can evade detection remains unclear. This descriptive pilot study investigates whether LLM-generated abstracts, edited by humans or LLM trained to mimic a specific writing style, can escape LLM detectors. Using a previously published original article, ChatGPT-4 generated an abstract (Abstract 1). This abstract underwent three modifications: context-based human editing (Abstract 2), stylistic human editing (Abstract 3), and ChatGPT editing incorporating the author's writing style (Abstract 4). The genuine human-written abstract from the original article served as Abstract 5. Five freely available LLM detectors analyzed these abstracts, providing LLM-generated probability scores. The genuinely LLM-generated manuscript (Abstract 1) was judged as LLM-generated with 82%-100% (median: 100%) probability. The genuinely human-written manuscript (Abstract 5) was judged as human-written with the LLM-generated probability of 0%-13% (median: 0%). Human-edited abstracts (Abstracts 2 and 3) exhibited a decreasing LLM-generated probability 4%-71% (median: 64%) and 2%-65% (median: 61%), respectively, but varied widely among detectors. The LLM-mimicked abstract (Abstract 4) was classified as LLM-generated, with LLM-generated probability ranging 82%-100% (median: 100%). The results showed variations across different LLM detectors. Supplementary experiments demonstrated a similar trend. Human editing reduces LLM-detection probabilities but does not guarantee evasion. LLM-generated content mimicking a specific writing style remains largely detectable. This preliminary experiment provided a novel study concept. Further studies on various manuscripts and different LLM detection methods will enhance understanding of the relationship between LLM-aided paper writing and LLM detectors.