How Well Do Simulated Population Samples with GPT-4 Align with Real Ones? The Case of the Eysenck Personality Questionnaire Revised-Abbreviated Personality Test.
Gregorio Ferreira, Jacopo Amidei, Rubén Nieto, Andreas Kaltenbrunner
{"title":"How Well Do Simulated Population Samples with GPT-4 Align with Real Ones? The Case of the Eysenck Personality Questionnaire Revised-Abbreviated Personality Test.","authors":"Gregorio Ferreira, Jacopo Amidei, Rubén Nieto, Andreas Kaltenbrunner","doi":"10.34133/hds.0284","DOIUrl":null,"url":null,"abstract":"<p><p><b>Background:</b> Advances in artificial intelligence have enabled the simulation of human-like behaviors, raising the possibility of using large language models (LLMs) to generate synthetic population samples for research purposes, which may be particularly useful in health and social sciences. <b>Methods:</b> This paper explores the potential of LLMs to simulate population samples mirroring real ones, as well as the feasibility of using personality questionnaires to assess the personality of LLMs. To advance in that direction, 2 experiments were conducted with GPT-4o using the Eysenck Personality Questionnaire Revised-Abbreviated (EPQR-A) in 6 languages: Spanish, English, Slovak, Hebrew, Portuguese, and Turkish. <b>Results:</b> We find that GPT-4o exhibits distinct personality traits, which vary based on parameter settings and the language of the questionnaire. While the model shows promising trends in reflecting certain personality traits and differences across gender and academic fields, discrepancies between the synthetic populations' responses and those from real populations remain. <b>Conclusions:</b> These inconsistencies suggest that creating fully reliable synthetic population samples for questionnaire testing is still an open challenge. Further research is required to better align synthetic and real population behaviors.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"5 ","pages":"0284"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12217932/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health data science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34133/hds.0284","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Advances in artificial intelligence have enabled the simulation of human-like behaviors, raising the possibility of using large language models (LLMs) to generate synthetic population samples for research purposes, which may be particularly useful in health and social sciences. Methods: This paper explores the potential of LLMs to simulate population samples mirroring real ones, as well as the feasibility of using personality questionnaires to assess the personality of LLMs. To advance in that direction, 2 experiments were conducted with GPT-4o using the Eysenck Personality Questionnaire Revised-Abbreviated (EPQR-A) in 6 languages: Spanish, English, Slovak, Hebrew, Portuguese, and Turkish. Results: We find that GPT-4o exhibits distinct personality traits, which vary based on parameter settings and the language of the questionnaire. While the model shows promising trends in reflecting certain personality traits and differences across gender and academic fields, discrepancies between the synthetic populations' responses and those from real populations remain. Conclusions: These inconsistencies suggest that creating fully reliable synthetic population samples for questionnaire testing is still an open challenge. Further research is required to better align synthetic and real population behaviors.