Dmitry Kobak, Rita González-Márquez, Emőke-Ágnes Horvát, Jan Lause
{"title":"通过过量的词汇钻研法学硕士辅助的生物医学出版物写作","authors":"Dmitry Kobak, Rita González-Márquez, Emőke-Ágnes Horvát, Jan Lause","doi":"10.1126/sciadv.adt3813","DOIUrl":null,"url":null,"abstract":"<div >Large language models (LLMs) like ChatGPT can generate and revise text with human-level performance. These models come with clear limitations, can produce inaccurate information, and reinforce existing biases. Yet, many scientists use them for their scholarly writing. But how widespread is such LLM usage in the academic literature? To answer this question for the field of biomedical research, we present an unbiased, large-scale approach: We study vocabulary changes in more than 15 million biomedical abstracts from 2010 to 2024 indexed by PubMed and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. This excess word analysis suggests that at least 13.5% of 2024 abstracts were processed with LLMs. This lower bound differed across disciplines, countries, and journals, reaching 40% for some subcorpora. We show that LLMs have had an unprecedented impact on scientific writing in biomedical research, surpassing the effect of major world events such as the COVID pandemic.</div>","PeriodicalId":21609,"journal":{"name":"Science Advances","volume":"11 27","pages":""},"PeriodicalIF":11.7000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.science.org/doi/reader/10.1126/sciadv.adt3813","citationCount":"0","resultStr":"{\"title\":\"Delving into LLM-assisted writing in biomedical publications through excess vocabulary\",\"authors\":\"Dmitry Kobak, Rita González-Márquez, Emőke-Ágnes Horvát, Jan Lause\",\"doi\":\"10.1126/sciadv.adt3813\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div >Large language models (LLMs) like ChatGPT can generate and revise text with human-level performance. These models come with clear limitations, can produce inaccurate information, and reinforce existing biases. Yet, many scientists use them for their scholarly writing. But how widespread is such LLM usage in the academic literature? To answer this question for the field of biomedical research, we present an unbiased, large-scale approach: We study vocabulary changes in more than 15 million biomedical abstracts from 2010 to 2024 indexed by PubMed and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. This excess word analysis suggests that at least 13.5% of 2024 abstracts were processed with LLMs. This lower bound differed across disciplines, countries, and journals, reaching 40% for some subcorpora. We show that LLMs have had an unprecedented impact on scientific writing in biomedical research, surpassing the effect of major world events such as the COVID pandemic.</div>\",\"PeriodicalId\":21609,\"journal\":{\"name\":\"Science Advances\",\"volume\":\"11 27\",\"pages\":\"\"},\"PeriodicalIF\":11.7000,\"publicationDate\":\"2025-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.science.org/doi/reader/10.1126/sciadv.adt3813\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Science Advances\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://www.science.org/doi/10.1126/sciadv.adt3813\",\"RegionNum\":1,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science Advances","FirstCategoryId":"103","ListUrlMain":"https://www.science.org/doi/10.1126/sciadv.adt3813","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
Delving into LLM-assisted writing in biomedical publications through excess vocabulary
Large language models (LLMs) like ChatGPT can generate and revise text with human-level performance. These models come with clear limitations, can produce inaccurate information, and reinforce existing biases. Yet, many scientists use them for their scholarly writing. But how widespread is such LLM usage in the academic literature? To answer this question for the field of biomedical research, we present an unbiased, large-scale approach: We study vocabulary changes in more than 15 million biomedical abstracts from 2010 to 2024 indexed by PubMed and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. This excess word analysis suggests that at least 13.5% of 2024 abstracts were processed with LLMs. This lower bound differed across disciplines, countries, and journals, reaching 40% for some subcorpora. We show that LLMs have had an unprecedented impact on scientific writing in biomedical research, surpassing the effect of major world events such as the COVID pandemic.
期刊介绍:
Science Advances, an open-access journal by AAAS, publishes impactful research in diverse scientific areas. It aims for fair, fast, and expert peer review, providing freely accessible research to readers. Led by distinguished scientists, the journal supports AAAS's mission by extending Science magazine's capacity to identify and promote significant advances. Evolving digital publishing technologies play a crucial role in advancing AAAS's global mission for science communication and benefitting humankind.