{"title":"The Mixed Subjects Design: Treating Large Language Models as Potentially Informative Observations","authors":"David Broska, Michael Howes, Austin van Loon","doi":"10.1177/00491241251326865","DOIUrl":null,"url":null,"abstract":"Large language models (LLMs) provide cost-effective but possibly inaccurate predictions of human behavior. Despite growing evidence that predicted and observed behavior are often not <jats:italic>interchangeable</jats:italic> , there is limited guidance on using LLMs to obtain valid estimates of causal effects and other parameters. We argue that LLM predictions should be treated as potentially informative observations, while human subjects serve as a gold standard in a <jats:italic>mixed subjects design</jats:italic> . This paradigm preserves validity and offers more precise estimates at a lower cost than experiments relying exclusively on human subjects. We demonstrate—and extend—prediction-powered inference (PPI), a method that combines predictions and observations. We define the <jats:italic>PPI correlation</jats:italic> as a measure of interchangeability and derive the <jats:italic>effective sample size</jats:italic> for PPI. We also introduce a power analysis to optimally choose between <jats:italic>informative but costly</jats:italic> human subjects and <jats:italic>less informative but cheap</jats:italic> predictions of human behavior. Mixed subjects designs could enhance scientific productivity and reduce inequality in access to costly evidence.","PeriodicalId":21849,"journal":{"name":"Sociological Methods & Research","volume":"4 1","pages":""},"PeriodicalIF":6.5000,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sociological Methods & Research","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1177/00491241251326865","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL SCIENCES, MATHEMATICAL METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Large language models (LLMs) provide cost-effective but possibly inaccurate predictions of human behavior. Despite growing evidence that predicted and observed behavior are often not interchangeable , there is limited guidance on using LLMs to obtain valid estimates of causal effects and other parameters. We argue that LLM predictions should be treated as potentially informative observations, while human subjects serve as a gold standard in a mixed subjects design . This paradigm preserves validity and offers more precise estimates at a lower cost than experiments relying exclusively on human subjects. We demonstrate—and extend—prediction-powered inference (PPI), a method that combines predictions and observations. We define the PPI correlation as a measure of interchangeability and derive the effective sample size for PPI. We also introduce a power analysis to optimally choose between informative but costly human subjects and less informative but cheap predictions of human behavior. Mixed subjects designs could enhance scientific productivity and reduce inequality in access to costly evidence.
期刊介绍:
Sociological Methods & Research is a quarterly journal devoted to sociology as a cumulative empirical science. The objectives of SMR are multiple, but emphasis is placed on articles that advance the understanding of the field through systematic presentations that clarify methodological problems and assist in ordering the known facts in an area. Review articles will be published, particularly those that emphasize a critical analysis of the status of the arts, but original presentations that are broadly based and provide new research will also be published. Intrinsically, SMR is viewed as substantive journal but one that is highly focused on the assessment of the scientific status of sociology. The scope is broad and flexible, and authors are invited to correspond with the editors about the appropriateness of their articles.