{"title":"A large-scale replication of scenario-based experiments in psychology and management using large language models.","authors":"Ziyan Cui, Ning Li, Huaikang Zhou","doi":"10.1038/s43588-025-00840-7","DOIUrl":null,"url":null,"abstract":"<p><p>We conducted a large-scale study replicating 156 psychological experiments from top social science journals using three state-of-the-art large language models (LLMs). Our results reveal that, while LLMs demonstrated high replication rates for main effects (73-81%) and moderate to strong success with interaction effects (46-63%), they consistently produced larger effect sizes than human studies. Notably, LLMs showed significantly lower replication rates for studies involving socially sensitive topics such as race, gender and ethics. When original studies reported null findings, LLMs produced significant results at remarkably high rates (68-83%); while this could reflect cleaner data with less noise, it also suggests potential risks of effect size overestimation. Our results demonstrate both the promises and the challenges of LLMs in psychological research: while LLMs are efficient tools for pilot testing and rapid hypothesis validation, enriching rather than replacing traditional human-participant studies, they require more nuanced interpretation and human validation for complex social phenomena and culturally sensitive research questions.</p>","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":" ","pages":""},"PeriodicalIF":12.0000,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature computational science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s43588-025-00840-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
We conducted a large-scale study replicating 156 psychological experiments from top social science journals using three state-of-the-art large language models (LLMs). Our results reveal that, while LLMs demonstrated high replication rates for main effects (73-81%) and moderate to strong success with interaction effects (46-63%), they consistently produced larger effect sizes than human studies. Notably, LLMs showed significantly lower replication rates for studies involving socially sensitive topics such as race, gender and ethics. When original studies reported null findings, LLMs produced significant results at remarkably high rates (68-83%); while this could reflect cleaner data with less noise, it also suggests potential risks of effect size overestimation. Our results demonstrate both the promises and the challenges of LLMs in psychological research: while LLMs are efficient tools for pilot testing and rapid hypothesis validation, enriching rather than replacing traditional human-participant studies, they require more nuanced interpretation and human validation for complex social phenomena and culturally sensitive research questions.