ChatGPT vs Social Surveys: Probing the Objective and Subjective Human Society

arXiv - CS - Computers and Society Pub Date : 2024-09-04 DOI:arxiv-2409.02601

Muzhi Zhou, Lu Yu, Xiaomin Geng, Lan Luo

{"title":"ChatGPT vs Social Surveys: Probing the Objective and Subjective Human Society","authors":"Muzhi Zhou, Lu Yu, Xiaomin Geng, Lan Luo","doi":"arxiv-2409.02601","DOIUrl":null,"url":null,"abstract":"The extent to which Large Language Models (LLMs) can simulate the\ndata-generating process for social surveys remains unclear. Current research\nhas not thoroughly assessed potential biases in the sociodemographic population\nrepresented within the language model's framework. Additionally, the subjective\nworlds of LLMs often show inconsistencies in how closely their responses match\nthose of groups of human respondents. In this paper, we used ChatGPT-3.5 to\nsimulate the sampling process and generated six socioeconomic characteristics\nfrom the 2020 US population. We also analyzed responses to questions about\nincome inequality and gender roles to explore GPT's subjective attitudes. By\nusing repeated random sampling, we created a sampling distribution to identify\nthe parameters of the GPT-generated population and compared these with Census\ndata. Our findings show some alignment in gender and age means with the actual\n2020 US population, but we also found mismatches in the distributions of racial\nand educational groups. Furthermore, there were significant differences between\nthe distribution of GPT's responses and human self-reported attitudes. While\nthe overall point estimates of GPT's income attitudinal responses seem to align\nwith the mean of the population occasionally, their response distributions\nfollow a normal distribution that diverges from human responses. In terms of\ngender relations, GPT's answers tend to cluster in the most frequently answered\ncategory, demonstrating a deterministic pattern. We conclude by emphasizing the\ndistinct design philosophies of LLMs and social surveys: LLMs aim to predict\nthe most suitable answers, while social surveys seek to reveal the\nheterogeneity among social groups.","PeriodicalId":501112,"journal":{"name":"arXiv - CS - Computers and Society","volume":"35 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computers and Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.02601","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The extent to which Large Language Models (LLMs) can simulate the data-generating process for social surveys remains unclear. Current research has not thoroughly assessed potential biases in the sociodemographic population represented within the language model's framework. Additionally, the subjective worlds of LLMs often show inconsistencies in how closely their responses match those of groups of human respondents. In this paper, we used ChatGPT-3.5 to simulate the sampling process and generated six socioeconomic characteristics from the 2020 US population. We also analyzed responses to questions about income inequality and gender roles to explore GPT's subjective attitudes. By using repeated random sampling, we created a sampling distribution to identify the parameters of the GPT-generated population and compared these with Census data. Our findings show some alignment in gender and age means with the actual 2020 US population, but we also found mismatches in the distributions of racial and educational groups. Furthermore, there were significant differences between the distribution of GPT's responses and human self-reported attitudes. While the overall point estimates of GPT's income attitudinal responses seem to align with the mean of the population occasionally, their response distributions follow a normal distribution that diverges from human responses. In terms of gender relations, GPT's answers tend to cluster in the most frequently answered category, demonstrating a deterministic pattern. We conclude by emphasizing the distinct design philosophies of LLMs and social surveys: LLMs aim to predict the most suitable answers, while social surveys seek to reveal the heterogeneity among social groups.

查看原文本刊更多论文

ChatGPT 与社会调查：探究人类社会的客观与主观

大型语言模型（LLM）能在多大程度上模拟社会调查的数据生成过程，目前仍不清楚。目前的研究还没有彻底评估语言模型框架内所代表的社会人口中可能存在的偏差。此外，LLMs 的主观世界经常会显示出它们的回答与人类受访者群体的回答在匹配程度上的不一致性。在本文中，我们使用 ChatGPT-3.5 模拟了抽样过程，并从 2020 年美国人口中生成了六个社会经济特征。我们还分析了对收入不平等和性别角色问题的回答，以探讨 GPT 的主观态度。通过重复随机抽样，我们创建了一个抽样分布，以确定 GPT 生成人口的参数，并将其与人口普查数据进行比较。我们的研究结果表明，性别和年龄平均值与 2020 年美国实际人口有一定程度的吻合，但我们也发现种族和教育群体的分布不匹配。此外，GPT 答案的分布与人类自我报告的态度之间也存在重大差异。虽然 GPT 收入态度反应的总体点估计值似乎偶尔与人口的平均值一致，但其反应分布遵循的正态分布却与人类的反应不同。在性别关系方面，GPT 的答案往往集中在最常回答的类别中，显示出一种确定性模式。最后，我们强调了 LLM 和社会调查不同的设计理念：LLM 旨在预测最合适的答案，而社会调查旨在揭示社会群体之间的异质性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Computers and Society

自引率

0.00%

发文量