从潜在变量的角度看大型语言模型中的认知幻影

Computers in Human Behavior: Artificial Humans Pub Date : 2025-05-01 DOI:10.1016/j.chbah.2025.100161

Sanne Peereboom , Inga Schwabe , Bennett Kleinberg

{"title":"从潜在变量的角度看大型语言模型中的认知幻影","authors":"Sanne Peereboom , Inga Schwabe , Bennett Kleinberg","doi":"10.1016/j.chbah.2025.100161","DOIUrl":null,"url":null,"abstract":"<div><div>Large language models (LLMs) increasingly reach real-world applications, necessitating a better understanding of their behaviour. Their size and complexity complicate traditional assessment methods, causing the emergence of alternative approaches inspired by the field of psychology. Recent studies administering psychometric questionnaires to LLMs report human-like traits in LLMs, potentially influencing LLM behaviour. However, this approach suffers from a validity problem: it presupposes that these traits exist in LLMs and that they are measurable with tools designed for humans. Typical procedures rarely acknowledge the validity problem in LLMs, comparing and interpreting average LLM scores. This study investigates this problem by comparing latent structures of personality between humans and three LLMs using two validated personality questionnaires. Findings suggest that questionnaires designed for humans do not validly measure similar constructs in LLMs, and that these constructs may not exist in LLMs at all, highlighting the need for psychometric analyses of LLM responses to avoid chasing cognitive phantoms.</div></div>","PeriodicalId":100324,"journal":{"name":"Computers in Human Behavior: Artificial Humans","volume":"4 ","pages":"Article 100161"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cognitive phantoms in large language models through the lens of latent variables\",\"authors\":\"Sanne Peereboom , Inga Schwabe , Bennett Kleinberg\",\"doi\":\"10.1016/j.chbah.2025.100161\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Large language models (LLMs) increasingly reach real-world applications, necessitating a better understanding of their behaviour. Their size and complexity complicate traditional assessment methods, causing the emergence of alternative approaches inspired by the field of psychology. Recent studies administering psychometric questionnaires to LLMs report human-like traits in LLMs, potentially influencing LLM behaviour. However, this approach suffers from a validity problem: it presupposes that these traits exist in LLMs and that they are measurable with tools designed for humans. Typical procedures rarely acknowledge the validity problem in LLMs, comparing and interpreting average LLM scores. This study investigates this problem by comparing latent structures of personality between humans and three LLMs using two validated personality questionnaires. Findings suggest that questionnaires designed for humans do not validly measure similar constructs in LLMs, and that these constructs may not exist in LLMs at all, highlighting the need for psychometric analyses of LLM responses to avoid chasing cognitive phantoms.</div></div>\",\"PeriodicalId\":100324,\"journal\":{\"name\":\"Computers in Human Behavior: Artificial Humans\",\"volume\":\"4 \",\"pages\":\"Article 100161\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers in Human Behavior: Artificial Humans\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2949882125000453\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in Human Behavior: Artificial Humans","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949882125000453","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

大型语言模型（llm）越来越多地应用于现实世界，需要更好地理解它们的行为。它们的规模和复杂性使传统的评估方法复杂化，导致受心理学领域启发的替代方法的出现。最近的研究对法学硕士进行了心理问卷调查，报告了法学硕士的类人特征，这可能会影响法学硕士的行为。然而，这种方法存在一个有效性问题：它假设这些特征存在于法学硕士中，并且它们可以用为人类设计的工具来测量。典型的程序很少承认法学硕士的效度问题，比较和解释法学硕士的平均分数。本研究采用两份经验证的人格问卷，比较了人类和三位法学硕士的人格潜在结构。研究结果表明，为人类设计的问卷并不能有效地测量法学硕士的类似构念，而且这些构念可能根本不存在于法学硕士中，这突出了法学硕士反应的心理测量分析的必要性，以避免追逐认知幻影。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cognitive phantoms in large language models through the lens of latent variables

Large language models (LLMs) increasingly reach real-world applications, necessitating a better understanding of their behaviour. Their size and complexity complicate traditional assessment methods, causing the emergence of alternative approaches inspired by the field of psychology. Recent studies administering psychometric questionnaires to LLMs report human-like traits in LLMs, potentially influencing LLM behaviour. However, this approach suffers from a validity problem: it presupposes that these traits exist in LLMs and that they are measurable with tools designed for humans. Typical procedures rarely acknowledge the validity problem in LLMs, comparing and interpreting average LLM scores. This study investigates this problem by comparing latent structures of personality between humans and three LLMs using two validated personality questionnaires. Findings suggest that questionnaires designed for humans do not validly measure similar constructs in LLMs, and that these constructs may not exist in LLMs at all, highlighting the need for psychometric analyses of LLM responses to avoid chasing cognitive phantoms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers in Human Behavior: Artificial Humans

自引率

0.00%

发文量