Where there's a will there's a way: ChatGPT is used more for science in countries where it is prohibited

arXiv - CS - Digital Libraries Pub Date : 2024-06-17 DOI:arxiv-2406.11583

Honglin Bao, Mengyi Sun, Misha Teplitskiy

{"title":"Where there's a will there's a way: ChatGPT is used more for science in countries where it is prohibited","authors":"Honglin Bao, Mengyi Sun, Misha Teplitskiy","doi":"arxiv-2406.11583","DOIUrl":null,"url":null,"abstract":"Regulating AI has emerged as a key societal challenge, but which methods of\nregulation are effective is unclear. Here, we measure the effectiveness of\nrestricting AI services geographically using the case of ChatGPT and science.\nOpenAI prohibits access to ChatGPT from several countries including China and\nRussia. If the restrictions are effective, there should be minimal use of\nChatGPT in prohibited countries. Drawing on the finding that early versions of\nChatGPT overrepresented distinctive words like \"delve,\" we developed a simple\nensemble classifier by training it on abstracts before and after ChatGPT\n\"polishing\". Testing on held-out abstracts and those where authors\nself-declared to have used AI for writing shows that our classifier\nsubstantially outperforms off-the-shelf LLM detectors like GPTZero and ZeroGPT.\nApplying the classifier to preprints from Arxiv, BioRxiv, and MedRxiv reveals\nthat ChatGPT was used in approximately 12.6% of preprints by August 2023 and\nuse was 7.7% higher in countries without legal access. Crucially, these\npatterns appeared before the first major legal LLM became widely available in\nChina, the largest restricted-country preprint producer. ChatGPT use was\nassociated with higher views and downloads, but not citations or journal\nplacement. Overall, restricting ChatGPT geographically has proven ineffective\nin science and possibly other domains, likely due to widespread workarounds.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"17 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Digital Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.11583","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Regulating AI has emerged as a key societal challenge, but which methods of regulation are effective is unclear. Here, we measure the effectiveness of restricting AI services geographically using the case of ChatGPT and science. OpenAI prohibits access to ChatGPT from several countries including China and Russia. If the restrictions are effective, there should be minimal use of ChatGPT in prohibited countries. Drawing on the finding that early versions of ChatGPT overrepresented distinctive words like "delve," we developed a simple ensemble classifier by training it on abstracts before and after ChatGPT "polishing". Testing on held-out abstracts and those where authors self-declared to have used AI for writing shows that our classifier substantially outperforms off-the-shelf LLM detectors like GPTZero and ZeroGPT. Applying the classifier to preprints from Arxiv, BioRxiv, and MedRxiv reveals that ChatGPT was used in approximately 12.6% of preprints by August 2023 and use was 7.7% higher in countries without legal access. Crucially, these patterns appeared before the first major legal LLM became widely available in China, the largest restricted-country preprint producer. ChatGPT use was associated with higher views and downloads, but not citations or journal placement. Overall, restricting ChatGPT geographically has proven ineffective in science and possibly other domains, likely due to widespread workarounds.

查看原文本刊更多论文

有志者事竟成：在禁止使用 ChatGPT 的国家，它更多地被用于科学研究

监管人工智能已成为一项关键的社会挑战，但哪些监管方法是有效的尚不清楚。OpenAI 禁止包括中国和俄罗斯在内的多个国家访问 ChatGPT。如果限制措施有效，那么在被禁止的国家使用 ChatGPT 的情况就应该少之又少。根据早期版本的 ChatGPT 对 "delve"（深入研究）等独特词汇的过多使用这一发现，我们开发了一个简单的集合分类器，在 ChatGPT "打磨 "前后对摘要进行训练。将分类器应用于 Arxiv、BioRxiv 和 MedRxiv 的预印本后发现，到 2023 年 8 月，约有 12.6% 的预印本使用了 ChatGPT，而在没有合法访问权限的国家，使用率则高出 7.7%。最重要的是，这些模式出现在中国这个最大的受限国家预印本生产国广泛提供第一个主要的合法 LLM 之前。ChatGPT 的使用与更高的浏览量和下载量有关，但与引用量和期刊排名无关。总体而言，对 ChatGPT 进行地域限制在科学领域被证明是无效的，在其他领域也可能如此，这很可能是由于普遍存在的变通方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Digital Libraries

自引率

0.00%

发文量