Luning Sun , Yuzhuo Yuan , Yuan Yao , Yanyan Li , Hao Zhang , Xing Xie , Xiting Wang , Fang Luo , David Stillwell
{"title":"Large language models show both individual and collective creativity comparable to humans","authors":"Luning Sun , Yuzhuo Yuan , Yuan Yao , Yanyan Li , Hao Zhang , Xing Xie , Xiting Wang , Fang Luo , David Stillwell","doi":"10.1016/j.tsc.2025.101870","DOIUrl":null,"url":null,"abstract":"<div><div>Artificial intelligence, especially large language models (LLMs) are increasingly adopted in the workplace, which has significant implications for the future of work if they show creativity comparable to humans. To measure the creativity of LLMs holistically, the current study uses thirteen creative tasks spanning three domains. We benchmark the LLMs against individual humans, and also take a novel approach by comparing them to the collective creativity of groups of humans. We find that the best LLMs (Claude and GPT-4) rank in the 52nd percentile against humans, and overall LLMs excel in divergent thinking and problem solving but lag in creative writing. We also show that the collective creativity in 10 LLM responses is equivalent to 8–10 humans. When there are more than 10 LLM responses, in terms of incremental collective creativity, two additional LLM responses equal one extra human. Ultimately, LLMs, when optimally applied, may compete with a small group of humans in the future of work.</div></div>","PeriodicalId":47729,"journal":{"name":"Thinking Skills and Creativity","volume":"57 ","pages":"Article 101870"},"PeriodicalIF":3.7000,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Thinking Skills and Creativity","FirstCategoryId":"95","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1871187125001191","RegionNum":2,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 0
Abstract
Artificial intelligence, especially large language models (LLMs) are increasingly adopted in the workplace, which has significant implications for the future of work if they show creativity comparable to humans. To measure the creativity of LLMs holistically, the current study uses thirteen creative tasks spanning three domains. We benchmark the LLMs against individual humans, and also take a novel approach by comparing them to the collective creativity of groups of humans. We find that the best LLMs (Claude and GPT-4) rank in the 52nd percentile against humans, and overall LLMs excel in divergent thinking and problem solving but lag in creative writing. We also show that the collective creativity in 10 LLM responses is equivalent to 8–10 humans. When there are more than 10 LLM responses, in terms of incremental collective creativity, two additional LLM responses equal one extra human. Ultimately, LLMs, when optimally applied, may compete with a small group of humans in the future of work.
期刊介绍:
Thinking Skills and Creativity is a new journal providing a peer-reviewed forum for communication and debate for the community of researchers interested in teaching for thinking and creativity. Papers may represent a variety of theoretical perspectives and methodological approaches and may relate to any age level in a diversity of settings: formal and informal, education and work-based.