{"title":"ChatGPT vs Social Surveys: Probing the Objective and Subjective Human Society","authors":"Muzhi Zhou, Lu Yu, Xiaomin Geng, Lan Luo","doi":"arxiv-2409.02601","DOIUrl":null,"url":null,"abstract":"The extent to which Large Language Models (LLMs) can simulate the\ndata-generating process for social surveys remains unclear. Current research\nhas not thoroughly assessed potential biases in the sociodemographic population\nrepresented within the language model's framework. Additionally, the subjective\nworlds of LLMs often show inconsistencies in how closely their responses match\nthose of groups of human respondents. In this paper, we used ChatGPT-3.5 to\nsimulate the sampling process and generated six socioeconomic characteristics\nfrom the 2020 US population. We also analyzed responses to questions about\nincome inequality and gender roles to explore GPT's subjective attitudes. By\nusing repeated random sampling, we created a sampling distribution to identify\nthe parameters of the GPT-generated population and compared these with Census\ndata. Our findings show some alignment in gender and age means with the actual\n2020 US population, but we also found mismatches in the distributions of racial\nand educational groups. Furthermore, there were significant differences between\nthe distribution of GPT's responses and human self-reported attitudes. While\nthe overall point estimates of GPT's income attitudinal responses seem to align\nwith the mean of the population occasionally, their response distributions\nfollow a normal distribution that diverges from human responses. In terms of\ngender relations, GPT's answers tend to cluster in the most frequently answered\ncategory, demonstrating a deterministic pattern. We conclude by emphasizing the\ndistinct design philosophies of LLMs and social surveys: LLMs aim to predict\nthe most suitable answers, while social surveys seek to reveal the\nheterogeneity among social groups.","PeriodicalId":501112,"journal":{"name":"arXiv - CS - Computers and Society","volume":"35 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computers and Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.02601","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The extent to which Large Language Models (LLMs) can simulate the
data-generating process for social surveys remains unclear. Current research
has not thoroughly assessed potential biases in the sociodemographic population
represented within the language model's framework. Additionally, the subjective
worlds of LLMs often show inconsistencies in how closely their responses match
those of groups of human respondents. In this paper, we used ChatGPT-3.5 to
simulate the sampling process and generated six socioeconomic characteristics
from the 2020 US population. We also analyzed responses to questions about
income inequality and gender roles to explore GPT's subjective attitudes. By
using repeated random sampling, we created a sampling distribution to identify
the parameters of the GPT-generated population and compared these with Census
data. Our findings show some alignment in gender and age means with the actual
2020 US population, but we also found mismatches in the distributions of racial
and educational groups. Furthermore, there were significant differences between
the distribution of GPT's responses and human self-reported attitudes. While
the overall point estimates of GPT's income attitudinal responses seem to align
with the mean of the population occasionally, their response distributions
follow a normal distribution that diverges from human responses. In terms of
gender relations, GPT's answers tend to cluster in the most frequently answered
category, demonstrating a deterministic pattern. We conclude by emphasizing the
distinct design philosophies of LLMs and social surveys: LLMs aim to predict
the most suitable answers, while social surveys seek to reveal the
heterogeneity among social groups.