How Well Do Simulated Population Samples with GPT-4 Align with Real Ones? The Case of the Eysenck Personality Questionnaire Revised-Abbreviated Personality Test.

Health data science Pub Date : 2025-07-02 eCollection Date: 2025-01-01 DOI:10.34133/hds.0284

Gregorio Ferreira, Jacopo Amidei, Rubén Nieto, Andreas Kaltenbrunner

{"title":"How Well Do Simulated Population Samples with GPT-4 Align with Real Ones? The Case of the Eysenck Personality Questionnaire Revised-Abbreviated Personality Test.","authors":"Gregorio Ferreira, Jacopo Amidei, Rubén Nieto, Andreas Kaltenbrunner","doi":"10.34133/hds.0284","DOIUrl":null,"url":null,"abstract":"Background: Advances in artificial intelligence have enabled the simulation of human-like behaviors, raising the possibility of using large language models (LLMs) to generate synthetic population samples for research purposes, which may be particularly useful in health and social sciences. Methods: This paper explores the potential of LLMs to simulate population samples mirroring real ones, as well as the feasibility of using personality questionnaires to assess the personality of LLMs. To advance in that direction, 2 experiments were conducted with GPT-4o using the Eysenck Personality Questionnaire Revised-Abbreviated (EPQR-A) in 6 languages: Spanish, English, Slovak, Hebrew, Portuguese, and Turkish. Results: We find that GPT-4o exhibits distinct personality traits, which vary based on parameter settings and the language of the questionnaire. While the model shows promising trends in reflecting certain personality traits and differences across gender and academic fields, discrepancies between the synthetic populations' responses and those from real populations remain. Conclusions: These inconsistencies suggest that creating fully reliable synthetic population samples for questionnaire testing is still an open challenge. Further research is required to better align synthetic and real population behaviors.","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"5 ","pages":"0284"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12217932/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health data science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34133/hds.0284","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Advances in artificial intelligence have enabled the simulation of human-like behaviors, raising the possibility of using large language models (LLMs) to generate synthetic population samples for research purposes, which may be particularly useful in health and social sciences. Methods: This paper explores the potential of LLMs to simulate population samples mirroring real ones, as well as the feasibility of using personality questionnaires to assess the personality of LLMs. To advance in that direction, 2 experiments were conducted with GPT-4o using the Eysenck Personality Questionnaire Revised-Abbreviated (EPQR-A) in 6 languages: Spanish, English, Slovak, Hebrew, Portuguese, and Turkish. Results: We find that GPT-4o exhibits distinct personality traits, which vary based on parameter settings and the language of the questionnaire. While the model shows promising trends in reflecting certain personality traits and differences across gender and academic fields, discrepancies between the synthetic populations' responses and those from real populations remain. Conclusions: These inconsistencies suggest that creating fully reliable synthetic population samples for questionnaire testing is still an open challenge. Further research is required to better align synthetic and real population behaviors.

查看原文本刊更多论文

GPT-4模拟人群样本与真实人群样本的一致性如何？艾森克人格问卷修正-简略人格测验案例。

背景：人工智能的进步使模拟类人行为成为可能，提高了使用大型语言模型（LLMs）生成用于研究目的的合成总体样本的可能性，这在健康和社会科学中可能特别有用。方法：本文探讨法学硕士模拟真实总体样本的潜力，以及使用人格问卷评估法学硕士人格的可行性。为了进一步研究这一方向，我们在6种语言（西班牙语、英语、斯洛伐克语、希伯来语、葡萄牙语和土耳其语）中使用Eysenck人格问卷（EPQR-A）对gpt - 40进行了2项实验。结果：我们发现gpt - 40表现出明显的人格特征，这些特征因问卷的参数设置和语言而异。虽然该模型在反映某些个性特征和性别和学术领域差异方面显示出有希望的趋势，但合成人群的反应与真实人群的反应之间的差异仍然存在。结论：这些不一致表明，为问卷测试创建完全可靠的合成总体样本仍然是一个开放的挑战。需要进一步的研究来更好地协调合成和真实的人口行为。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Health data science

CiteScore

3.70

自引率

0.00%

发文量