{"title":"有志者事竟成:在禁止使用 ChatGPT 的国家,它更多地被用于科学研究","authors":"Honglin Bao, Mengyi Sun, Misha Teplitskiy","doi":"arxiv-2406.11583","DOIUrl":null,"url":null,"abstract":"Regulating AI has emerged as a key societal challenge, but which methods of\nregulation are effective is unclear. Here, we measure the effectiveness of\nrestricting AI services geographically using the case of ChatGPT and science.\nOpenAI prohibits access to ChatGPT from several countries including China and\nRussia. If the restrictions are effective, there should be minimal use of\nChatGPT in prohibited countries. Drawing on the finding that early versions of\nChatGPT overrepresented distinctive words like \"delve,\" we developed a simple\nensemble classifier by training it on abstracts before and after ChatGPT\n\"polishing\". Testing on held-out abstracts and those where authors\nself-declared to have used AI for writing shows that our classifier\nsubstantially outperforms off-the-shelf LLM detectors like GPTZero and ZeroGPT.\nApplying the classifier to preprints from Arxiv, BioRxiv, and MedRxiv reveals\nthat ChatGPT was used in approximately 12.6% of preprints by August 2023 and\nuse was 7.7% higher in countries without legal access. Crucially, these\npatterns appeared before the first major legal LLM became widely available in\nChina, the largest restricted-country preprint producer. ChatGPT use was\nassociated with higher views and downloads, but not citations or journal\nplacement. Overall, restricting ChatGPT geographically has proven ineffective\nin science and possibly other domains, likely due to widespread workarounds.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"17 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Where there's a will there's a way: ChatGPT is used more for science in countries where it is prohibited\",\"authors\":\"Honglin Bao, Mengyi Sun, Misha Teplitskiy\",\"doi\":\"arxiv-2406.11583\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Regulating AI has emerged as a key societal challenge, but which methods of\\nregulation are effective is unclear. Here, we measure the effectiveness of\\nrestricting AI services geographically using the case of ChatGPT and science.\\nOpenAI prohibits access to ChatGPT from several countries including China and\\nRussia. If the restrictions are effective, there should be minimal use of\\nChatGPT in prohibited countries. Drawing on the finding that early versions of\\nChatGPT overrepresented distinctive words like \\\"delve,\\\" we developed a simple\\nensemble classifier by training it on abstracts before and after ChatGPT\\n\\\"polishing\\\". Testing on held-out abstracts and those where authors\\nself-declared to have used AI for writing shows that our classifier\\nsubstantially outperforms off-the-shelf LLM detectors like GPTZero and ZeroGPT.\\nApplying the classifier to preprints from Arxiv, BioRxiv, and MedRxiv reveals\\nthat ChatGPT was used in approximately 12.6% of preprints by August 2023 and\\nuse was 7.7% higher in countries without legal access. Crucially, these\\npatterns appeared before the first major legal LLM became widely available in\\nChina, the largest restricted-country preprint producer. ChatGPT use was\\nassociated with higher views and downloads, but not citations or journal\\nplacement. Overall, restricting ChatGPT geographically has proven ineffective\\nin science and possibly other domains, likely due to widespread workarounds.\",\"PeriodicalId\":501285,\"journal\":{\"name\":\"arXiv - CS - Digital Libraries\",\"volume\":\"17 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Digital Libraries\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2406.11583\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Digital Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.11583","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Where there's a will there's a way: ChatGPT is used more for science in countries where it is prohibited
Regulating AI has emerged as a key societal challenge, but which methods of
regulation are effective is unclear. Here, we measure the effectiveness of
restricting AI services geographically using the case of ChatGPT and science.
OpenAI prohibits access to ChatGPT from several countries including China and
Russia. If the restrictions are effective, there should be minimal use of
ChatGPT in prohibited countries. Drawing on the finding that early versions of
ChatGPT overrepresented distinctive words like "delve," we developed a simple
ensemble classifier by training it on abstracts before and after ChatGPT
"polishing". Testing on held-out abstracts and those where authors
self-declared to have used AI for writing shows that our classifier
substantially outperforms off-the-shelf LLM detectors like GPTZero and ZeroGPT.
Applying the classifier to preprints from Arxiv, BioRxiv, and MedRxiv reveals
that ChatGPT was used in approximately 12.6% of preprints by August 2023 and
use was 7.7% higher in countries without legal access. Crucially, these
patterns appeared before the first major legal LLM became widely available in
China, the largest restricted-country preprint producer. ChatGPT use was
associated with higher views and downloads, but not citations or journal
placement. Overall, restricting ChatGPT geographically has proven ineffective
in science and possibly other domains, likely due to widespread workarounds.