大型语言模型内容检测器：人类编辑和ChatGPT模仿个人风格的影响-初步研究。

IF 2

Journal of investigative medicine : the official publication of the American Federation for Clinical Research Pub Date : 2025-06-17 DOI:10.1177/10815589251352572

Shigeki Matsubara

{"title":"大型语言模型内容检测器：人类编辑和ChatGPT模仿个人风格的影响-初步研究。","authors":"Shigeki Matsubara","doi":"10.1177/10815589251352572","DOIUrl":null,"url":null,"abstract":"The use of large language model (LLM), such as ChatGPT, in academic writing is increasing, but the extent to which LLM-generated content can evade detection remains unclear. This descriptive pilot study investigates whether LLM-generated abstracts, edited by humans or LLM trained to mimic a specific writing style, can escape LLM detectors. Using a previously published original article, ChatGPT-4 generated an abstract (Abstract 1). This abstract underwent three modifications: context-based human editing (Abstract 2), stylistic human editing (Abstract 3), and ChatGPT editing incorporating the author's writing style (Abstract 4). The genuine human-written abstract from the original article served as Abstract 5. Five freely available LLM detectors analyzed these abstracts, providing LLM-generated probability scores. The genuinely LLM-generated manuscript (Abstract 1) was judged as LLM-generated with 82%-100% (median: 100%) probability. The genuinely human-written manuscript (Abstract 5) was judged as human-written with the LLM-generated probability of 0%-13% (median: 0%). Human-edited abstracts (Abstracts 2 and 3) exhibited a decreasing LLM-generated probability 4%-71% (median: 64%) and 2%-65% (median: 61%), respectively, but varied widely among detectors. The LLM-mimicked abstract (Abstract 4) was classified as LLM-generated, with LLM-generated probability ranging 82%-100% (median: 100%). The results showed variations across different LLM detectors. Supplementary experiments demonstrated a similar trend. Human editing reduces LLM-detection probabilities but does not guarantee evasion. LLM-generated content mimicking a specific writing style remains largely detectable. This preliminary experiment provided a novel study concept. Further studies on various manuscripts and different LLM detection methods will enhance understanding of the relationship between LLM-aided paper writing and LLM detectors.","PeriodicalId":520677,"journal":{"name":"Journal of investigative medicine : the official publication of the American Federation for Clinical Research","volume":" ","pages":"10815589251352572"},"PeriodicalIF":2.0000,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Large language model content detectors: Effects of human editing and ChatGPT mimicking individual style-A preliminary study.\",\"authors\":\"Shigeki Matsubara\",\"doi\":\"10.1177/10815589251352572\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The use of large language model (LLM), such as ChatGPT, in academic writing is increasing, but the extent to which LLM-generated content can evade detection remains unclear. This descriptive pilot study investigates whether LLM-generated abstracts, edited by humans or LLM trained to mimic a specific writing style, can escape LLM detectors. Using a previously published original article, ChatGPT-4 generated an abstract (Abstract 1). This abstract underwent three modifications: context-based human editing (Abstract 2), stylistic human editing (Abstract 3), and ChatGPT editing incorporating the author's writing style (Abstract 4). The genuine human-written abstract from the original article served as Abstract 5. Five freely available LLM detectors analyzed these abstracts, providing LLM-generated probability scores. The genuinely LLM-generated manuscript (Abstract 1) was judged as LLM-generated with 82%-100% (median: 100%) probability. The genuinely human-written manuscript (Abstract 5) was judged as human-written with the LLM-generated probability of 0%-13% (median: 0%). Human-edited abstracts (Abstracts 2 and 3) exhibited a decreasing LLM-generated probability 4%-71% (median: 64%) and 2%-65% (median: 61%), respectively, but varied widely among detectors. The LLM-mimicked abstract (Abstract 4) was classified as LLM-generated, with LLM-generated probability ranging 82%-100% (median: 100%). The results showed variations across different LLM detectors. Supplementary experiments demonstrated a similar trend. Human editing reduces LLM-detection probabilities but does not guarantee evasion. LLM-generated content mimicking a specific writing style remains largely detectable. This preliminary experiment provided a novel study concept. Further studies on various manuscripts and different LLM detection methods will enhance understanding of the relationship between LLM-aided paper writing and LLM detectors.\",\"PeriodicalId\":520677,\"journal\":{\"name\":\"Journal of investigative medicine : the official publication of the American Federation for Clinical Research\",\"volume\":\" \",\"pages\":\"10815589251352572\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-06-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of investigative medicine : the official publication of the American Federation for Clinical Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/10815589251352572\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of investigative medicine : the official publication of the American Federation for Clinical Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/10815589251352572","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在学术写作中越来越多地使用大型语言模型（LLM），如ChatGPT，但LLM生成的内容能在多大程度上逃避检测仍不清楚。这项描述性试点研究调查了LLM生成的摘要，由人类编辑或LLM训练以模仿特定的写作风格，是否可以逃脱LLM检测器。ChatGPT-4使用先前发表的原创文章生成摘要（摘要1）。摘要经过三次修改：基于语境的人工编辑（摘要2）、体裁的人工编辑（摘要3）和结合作者写作风格的ChatGPT编辑（摘要4）。原始文章中真正的人工摘要作为摘要5。五个免费的LLM检测器分析了这些摘要，提供了LLM生成的概率分数。真正的llm生成的稿件（摘要1）被判断为llm生成的概率为82% ~100%（中位数为100%）。真正由人类撰写的稿件（摘要5）被判断为由人类撰写，llm生成的概率为0~13%（中位数为0%）。人工编辑摘要（摘要2和摘要3）的llm生成概率分别为4~71%（中位数：64%）和2~65%（中位数：61%），但在不同的检测器之间差异很大。模拟llm的摘要（摘要4）被归类为llm生成，llm生成的概率范围为82~100%（中位数为100%）。结果显示不同LLM检测器之间存在差异。补充实验也显示了类似的趋势。人工编辑降低了llm检测的概率，但不能保证逃避。法学硕士生成的模仿特定写作风格的内容在很大程度上仍然是可检测的。该初步实验提供了一种新的研究思路。对各种稿件和不同LLM检测方法的进一步研究将加深对LLM辅助论文写作与LLM检测器之间关系的理解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Large language model content detectors: Effects of human editing and ChatGPT mimicking individual style-A preliminary study.

The use of large language model (LLM), such as ChatGPT, in academic writing is increasing, but the extent to which LLM-generated content can evade detection remains unclear. This descriptive pilot study investigates whether LLM-generated abstracts, edited by humans or LLM trained to mimic a specific writing style, can escape LLM detectors. Using a previously published original article, ChatGPT-4 generated an abstract (Abstract 1). This abstract underwent three modifications: context-based human editing (Abstract 2), stylistic human editing (Abstract 3), and ChatGPT editing incorporating the author's writing style (Abstract 4). The genuine human-written abstract from the original article served as Abstract 5. Five freely available LLM detectors analyzed these abstracts, providing LLM-generated probability scores. The genuinely LLM-generated manuscript (Abstract 1) was judged as LLM-generated with 82%-100% (median: 100%) probability. The genuinely human-written manuscript (Abstract 5) was judged as human-written with the LLM-generated probability of 0%-13% (median: 0%). Human-edited abstracts (Abstracts 2 and 3) exhibited a decreasing LLM-generated probability 4%-71% (median: 64%) and 2%-65% (median: 61%), respectively, but varied widely among detectors. The LLM-mimicked abstract (Abstract 4) was classified as LLM-generated, with LLM-generated probability ranging 82%-100% (median: 100%). The results showed variations across different LLM detectors. Supplementary experiments demonstrated a similar trend. Human editing reduces LLM-detection probabilities but does not guarantee evasion. LLM-generated content mimicking a specific writing style remains largely detectable. This preliminary experiment provided a novel study concept. Further studies on various manuscripts and different LLM detection methods will enhance understanding of the relationship between LLM-aided paper writing and LLM detectors.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of investigative medicine : the official publication of the American Federation for Clinical Research

自引率

0.00%

发文量