人工智能生成的 LLM 文本是否已渗入科学写作领域?预印本平台的大规模分析

Huzi Cheng, Bin Sheng, Aaron Lee, Varun Chaudhary, Atanas G. Atanasov, Nan Liu, Yue Qiu, Tien Yin Wong, Yih-Chung Tham, Ying-Feng Zheng
{"title":"人工智能生成的 LLM 文本是否已渗入科学写作领域?预印本平台的大规模分析","authors":"Huzi Cheng, Bin Sheng, Aaron Lee, Varun Chaudhary, Atanas G. Atanasov, Nan Liu, Yue Qiu, Tien Yin Wong, Yih-Chung Tham, Ying-Feng Zheng","doi":"10.1101/2024.03.25.586710","DOIUrl":null,"url":null,"abstract":"Since the release of ChatGPT in 2022, AI-generated texts have inevitably permeated various types of writing, sparking debates about the quality and quantity of content produced by such large language models (LLM). This study investigates a critical question: Have AI-generated texts from LLM infiltrated the realm of scientific writing, and if so, to what extent and in what setting? By analyzing a dataset comprised of preprint manuscripts uploaded to arXiv, bioRxiv, and medRxiv over the past two years, we confirmed and quantified the widespread influence of AI-generated texts in scientific publications using the latest LLM-text detection technique, the Binoculars LLM-detector. Further analyses with this tool reveal that: (1) the AI influence correlates with the trend of ChatGPT web searches; (2) it is widespread across many scientific domains but exhibits distinct impacts within them (highest: computer science, engineering sciences); (3) the influence varies with authors who had different language speaking backgrounds and geographic regions according to the location of their affiliations (>5%: Italy, China, average over countries), and (4) AI-generated texts are used in various content types in manuscripts (most significant: hypothesis formulation, conclusion summarization). Based on these findings, an AI-revision index is developed and calibrated, giving quantitative estimates about how AI is used in scientific writing. Suggestions about advantages and safe use of AI-augmented scientific writing are discussed based on our observations.","PeriodicalId":501568,"journal":{"name":"bioRxiv - Scientific Communication and Education","volume":"77 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Have AI-Generated Texts from LLM Infiltrated the Realm of Scientific Writing? A Large-Scale Analysis of Preprint Platforms\",\"authors\":\"Huzi Cheng, Bin Sheng, Aaron Lee, Varun Chaudhary, Atanas G. Atanasov, Nan Liu, Yue Qiu, Tien Yin Wong, Yih-Chung Tham, Ying-Feng Zheng\",\"doi\":\"10.1101/2024.03.25.586710\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Since the release of ChatGPT in 2022, AI-generated texts have inevitably permeated various types of writing, sparking debates about the quality and quantity of content produced by such large language models (LLM). This study investigates a critical question: Have AI-generated texts from LLM infiltrated the realm of scientific writing, and if so, to what extent and in what setting? By analyzing a dataset comprised of preprint manuscripts uploaded to arXiv, bioRxiv, and medRxiv over the past two years, we confirmed and quantified the widespread influence of AI-generated texts in scientific publications using the latest LLM-text detection technique, the Binoculars LLM-detector. Further analyses with this tool reveal that: (1) the AI influence correlates with the trend of ChatGPT web searches; (2) it is widespread across many scientific domains but exhibits distinct impacts within them (highest: computer science, engineering sciences); (3) the influence varies with authors who had different language speaking backgrounds and geographic regions according to the location of their affiliations (>5%: Italy, China, average over countries), and (4) AI-generated texts are used in various content types in manuscripts (most significant: hypothesis formulation, conclusion summarization). Based on these findings, an AI-revision index is developed and calibrated, giving quantitative estimates about how AI is used in scientific writing. Suggestions about advantages and safe use of AI-augmented scientific writing are discussed based on our observations.\",\"PeriodicalId\":501568,\"journal\":{\"name\":\"bioRxiv - Scientific Communication and Education\",\"volume\":\"77 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"bioRxiv - Scientific Communication and Education\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2024.03.25.586710\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Scientific Communication and Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.03.25.586710","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

自 2022 年 ChatGPT 发布以来,人工智能生成的文本已不可避免地渗透到各类写作中,引发了关于此类大型语言模型(LLM)所生成内容的质量和数量的争论。本研究探讨了一个关键问题:人工智能生成的 LLM 文本是否已经渗透到科学写作领域?通过分析过去两年上传到 arXiv、bioRxiv 和 medRxiv 的预印本手稿数据集,我们使用最新的 LLM 文本检测技术 Binoculars LLM-detector,证实并量化了人工智能生成的文本在科学出版物中的广泛影响。利用这一工具进行的进一步分析表明(1) 人工智能的影响与 ChatGPT 网络搜索的趋势相关;(2) 人工智能的影响在许多科学领域都很普遍,但在这些领域中又表现出不同的影响(最高:计算机科学、工程科学);(3) 人工智能的影响因作者的语言背景和所属地理区域而异(>5%:意大利、中国,各国平均);(4) 人工智能生成的文本被用于手稿中的各种内容类型(最重要的是:假设的提出、结论的总结)。基于这些发现,我们制定并校准了人工智能修订指数,对人工智能在科学写作中的使用情况进行了量化估算。根据我们的观察结果,讨论了有关人工智能增强科学写作的优势和安全使用的建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Have AI-Generated Texts from LLM Infiltrated the Realm of Scientific Writing? A Large-Scale Analysis of Preprint Platforms
Since the release of ChatGPT in 2022, AI-generated texts have inevitably permeated various types of writing, sparking debates about the quality and quantity of content produced by such large language models (LLM). This study investigates a critical question: Have AI-generated texts from LLM infiltrated the realm of scientific writing, and if so, to what extent and in what setting? By analyzing a dataset comprised of preprint manuscripts uploaded to arXiv, bioRxiv, and medRxiv over the past two years, we confirmed and quantified the widespread influence of AI-generated texts in scientific publications using the latest LLM-text detection technique, the Binoculars LLM-detector. Further analyses with this tool reveal that: (1) the AI influence correlates with the trend of ChatGPT web searches; (2) it is widespread across many scientific domains but exhibits distinct impacts within them (highest: computer science, engineering sciences); (3) the influence varies with authors who had different language speaking backgrounds and geographic regions according to the location of their affiliations (>5%: Italy, China, average over countries), and (4) AI-generated texts are used in various content types in manuscripts (most significant: hypothesis formulation, conclusion summarization). Based on these findings, an AI-revision index is developed and calibrated, giving quantitative estimates about how AI is used in scientific writing. Suggestions about advantages and safe use of AI-augmented scientific writing are discussed based on our observations.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信