Huzi Cheng, Bin Sheng, Aaron Lee, Varun Chaudhary, Atanas G. Atanasov, Nan Liu, Yue Qiu, Tien Yin Wong, Yih-Chung Tham, Ying-Feng Zheng
{"title":"Have AI-Generated Texts from LLM Infiltrated the Realm of Scientific Writing? A Large-Scale Analysis of Preprint Platforms","authors":"Huzi Cheng, Bin Sheng, Aaron Lee, Varun Chaudhary, Atanas G. Atanasov, Nan Liu, Yue Qiu, Tien Yin Wong, Yih-Chung Tham, Ying-Feng Zheng","doi":"10.1101/2024.03.25.586710","DOIUrl":null,"url":null,"abstract":"Since the release of ChatGPT in 2022, AI-generated texts have inevitably permeated various types of writing, sparking debates about the quality and quantity of content produced by such large language models (LLM). This study investigates a critical question: Have AI-generated texts from LLM infiltrated the realm of scientific writing, and if so, to what extent and in what setting? By analyzing a dataset comprised of preprint manuscripts uploaded to arXiv, bioRxiv, and medRxiv over the past two years, we confirmed and quantified the widespread influence of AI-generated texts in scientific publications using the latest LLM-text detection technique, the Binoculars LLM-detector. Further analyses with this tool reveal that: (1) the AI influence correlates with the trend of ChatGPT web searches; (2) it is widespread across many scientific domains but exhibits distinct impacts within them (highest: computer science, engineering sciences); (3) the influence varies with authors who had different language speaking backgrounds and geographic regions according to the location of their affiliations (>5%: Italy, China, average over countries), and (4) AI-generated texts are used in various content types in manuscripts (most significant: hypothesis formulation, conclusion summarization). Based on these findings, an AI-revision index is developed and calibrated, giving quantitative estimates about how AI is used in scientific writing. Suggestions about advantages and safe use of AI-augmented scientific writing are discussed based on our observations.","PeriodicalId":501568,"journal":{"name":"bioRxiv - Scientific Communication and Education","volume":"77 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Scientific Communication and Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.03.25.586710","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Since the release of ChatGPT in 2022, AI-generated texts have inevitably permeated various types of writing, sparking debates about the quality and quantity of content produced by such large language models (LLM). This study investigates a critical question: Have AI-generated texts from LLM infiltrated the realm of scientific writing, and if so, to what extent and in what setting? By analyzing a dataset comprised of preprint manuscripts uploaded to arXiv, bioRxiv, and medRxiv over the past two years, we confirmed and quantified the widespread influence of AI-generated texts in scientific publications using the latest LLM-text detection technique, the Binoculars LLM-detector. Further analyses with this tool reveal that: (1) the AI influence correlates with the trend of ChatGPT web searches; (2) it is widespread across many scientific domains but exhibits distinct impacts within them (highest: computer science, engineering sciences); (3) the influence varies with authors who had different language speaking backgrounds and geographic regions according to the location of their affiliations (>5%: Italy, China, average over countries), and (4) AI-generated texts are used in various content types in manuscripts (most significant: hypothesis formulation, conclusion summarization). Based on these findings, an AI-revision index is developed and calibrated, giving quantitative estimates about how AI is used in scientific writing. Suggestions about advantages and safe use of AI-augmented scientific writing are discussed based on our observations.