模拟公众意见：比较llm和随机森林的分布和个人水平预测。

IF 2 3区物理与天体物理 Q2 PHYSICS, MULTIDISCIPLINARY

Entropy Pub Date : 2025-09-02 DOI:10.3390/e27090923

Fernando Miranda, Pedro Paulo Balbi

{"title":"模拟公众意见：比较llm和随机森林的分布和个人水平预测。","authors":"Fernando Miranda, Pedro Paulo Balbi","doi":"10.3390/e27090923","DOIUrl":null,"url":null,"abstract":"Understanding and modeling the flow of information in human societies is essential for capturing phenomena such as polarization, opinion formation, and misinformation diffusion. Traditional agent-based models often rely on simplified behavioral rules that fail to capture the nuanced and context-sensitive nature of human decision-making. In this study, we explore the potential of Large Language Models (LLMs) as data-driven, high-fidelity agents capable of simulating individual opinions under varying informational conditions. Conditioning LLMs on real survey data from the 2020 American National Election Studies (ANES), we investigate their ability to predict individual-level responses across a spectrum of political and social issues in a zero-shot setting, without any training on the survey outcomes. Using Jensen-Shannon distance to quantify divergence in opinion distributions and F1-score to measure predictive accuracy, we compare LLM-generated simulations to those produced by a supervised Random Forest model. While performance at the individual level is comparable, LLMs consistently produce aggregate opinion distributions closer to the empirical ground truth. These findings suggest that LLMs offer a promising new method for simulating complex opinion dynamics and modeling the probabilistic structure of belief systems in computational social science.","PeriodicalId":11694,"journal":{"name":"Entropy","volume":"27 9","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12468613/pdf/","citationCount":"0","resultStr":"{\"title\":\"Simulating Public Opinion: Comparing Distributional and Individual-Level Predictions from LLMs and Random Forests.\",\"authors\":\"Fernando Miranda, Pedro Paulo Balbi\",\"doi\":\"10.3390/e27090923\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Understanding and modeling the flow of information in human societies is essential for capturing phenomena such as polarization, opinion formation, and misinformation diffusion. Traditional agent-based models often rely on simplified behavioral rules that fail to capture the nuanced and context-sensitive nature of human decision-making. In this study, we explore the potential of Large Language Models (LLMs) as data-driven, high-fidelity agents capable of simulating individual opinions under varying informational conditions. Conditioning LLMs on real survey data from the 2020 American National Election Studies (ANES), we investigate their ability to predict individual-level responses across a spectrum of political and social issues in a zero-shot setting, without any training on the survey outcomes. Using Jensen-Shannon distance to quantify divergence in opinion distributions and F1-score to measure predictive accuracy, we compare LLM-generated simulations to those produced by a supervised Random Forest model. While performance at the individual level is comparable, LLMs consistently produce aggregate opinion distributions closer to the empirical ground truth. These findings suggest that LLMs offer a promising new method for simulating complex opinion dynamics and modeling the probabilistic structure of belief systems in computational social science.\",\"PeriodicalId\":11694,\"journal\":{\"name\":\"Entropy\",\"volume\":\"27 9\",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12468613/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Entropy\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.3390/e27090923\",\"RegionNum\":3,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PHYSICS, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entropy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/e27090923","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

理解和模拟人类社会中的信息流对于捕捉两极分化、意见形成和错误信息传播等现象至关重要。传统的基于主体的模型通常依赖于简化的行为规则，无法捕捉人类决策的细微差别和上下文敏感性。在这项研究中，我们探索了大型语言模型（llm）作为数据驱动的高保真代理的潜力，能够在不同的信息条件下模拟个人意见。根据2020年美国全国选举研究（ANES）的真实调查数据，我们调查了法学硕士在零射击环境下预测个人层面对一系列政治和社会问题反应的能力，而没有对调查结果进行任何培训。我们使用Jensen-Shannon距离来量化意见分布的分歧，并使用f1分数来衡量预测准确性，将llm生成的模拟与监督随机森林模型产生的模拟进行比较。虽然在个人层面上的表现是可比性的，但法学硕士始终产生更接近经验基础真理的总体意见分布。这些发现表明，llm为模拟复杂的意见动态和模拟计算社会科学中信念系统的概率结构提供了一种有前途的新方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Simulating Public Opinion: Comparing Distributional and Individual-Level Predictions from LLMs and Random Forests.

Understanding and modeling the flow of information in human societies is essential for capturing phenomena such as polarization, opinion formation, and misinformation diffusion. Traditional agent-based models often rely on simplified behavioral rules that fail to capture the nuanced and context-sensitive nature of human decision-making. In this study, we explore the potential of Large Language Models (LLMs) as data-driven, high-fidelity agents capable of simulating individual opinions under varying informational conditions. Conditioning LLMs on real survey data from the 2020 American National Election Studies (ANES), we investigate their ability to predict individual-level responses across a spectrum of political and social issues in a zero-shot setting, without any training on the survey outcomes. Using Jensen-Shannon distance to quantify divergence in opinion distributions and F1-score to measure predictive accuracy, we compare LLM-generated simulations to those produced by a supervised Random Forest model. While performance at the individual level is comparable, LLMs consistently produce aggregate opinion distributions closer to the empirical ground truth. These findings suggest that LLMs offer a promising new method for simulating complex opinion dynamics and modeling the probabilistic structure of belief systems in computational social science.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Entropy PHYSICS, MULTIDISCIPLINARY-

CiteScore

4.90

自引率

11.10%

发文量

1580

审稿时长

21.05 days

期刊介绍： Entropy (ISSN 1099-4300), an international and interdisciplinary journal of entropy and information studies, publishes reviews, regular research papers and short notes. Our aim is to encourage scientists to publish as much as possible their theoretical and experimental details. There is no restriction on the length of the papers. If there are computation and the experiment, the details must be provided so that the results can be reproduced.