读心术的下一个前沿领域是什么？利用动态视听刺激评估生成式人工智能（GAI）的社会认知能力

IF 5.8 Q1 PSYCHOLOGY, EXPERIMENTAL

Computers in human behavior reports Pub Date : 2025-05-23 DOI:10.1016/j.chbr.2025.100702

Elad Refoua , Zohar Elyoseph , Renata Wacker , Isabel Dziobek , Iftach Tsafrir , Gunther Meinlschmidt

{"title":"读心术的下一个前沿领域是什么？利用动态视听刺激评估生成式人工智能（GAI）的社会认知能力","authors":"Elad Refoua , Zohar Elyoseph , Renata Wacker , Isabel Dziobek , Iftach Tsafrir , Gunther Meinlschmidt","doi":"10.1016/j.chbr.2025.100702","DOIUrl":null,"url":null,"abstract":"<div><div>The integration of Generative Artificial Intelligence (GAI) into human social contexts has raised fundamental questions about machines' capacity to understand and respond to complex emotional and social dynamics. While recent studies have demonstrated GAI's promising capabilities in processing static emotional content, the frontier of dynamic social cognition – where multiple modalities converge to create naturalistic social scenarios – remained largely unexplored. This study advances our understanding by examining the social-cognitive capabilities of Google's Gemini 1.5 Pro model through its performance on the Movie for the Assessment of Social Cognition (MASC), a sophisticated instrument designed to evaluate mentalization abilities using dynamic audiovisual stimuli. We compared the model's performance to a human normative sample (N = 1230) across varying temperature settings (a parameter controlling the level of randomness in the AI's output, where lower values lead to more deterministic responses and higher values increase variability; set at 0, 0.5, and 1). Results revealed that Gemini 1.5 Pro consistently performed above chance across all conditions (all corrected <em>p</em>s < 0.001, Cohen's <em>h</em> range = 1.17–1.41) and significantly outperformed the human sample mean (<em>Z</em> = 2.24, <em>p</em> = .025; Glass's Δ = 0.92, 95 % CI [0.11, 1.72]; Hedges' <em>g</em> = 0.92, 95 % CI [0.12, 1.72]). Analysis of error patterns revealed a distribution between hyper-mentalizing (41.0 %; over-attribution of mental states), hypo-mentalizing (46.2 %; under-attribution of mental states), and non-mentalizing (12.8 %; failure to recognize mental states) errors. These findings extend our understanding of artificial social cognition to complex multimodal processing while raising important questions about the nature of machine-based social understanding. The implications span theoretical considerations in artificial Theory of Mind to practical applications in mental health care and social skills training, though careful consideration is warranted regarding the fundamental differences between human and artificial social cognitive processing.</div></div>","PeriodicalId":72681,"journal":{"name":"Computers in human behavior reports","volume":"19 ","pages":"Article 100702"},"PeriodicalIF":5.8000,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The next frontier in mindreading? Assessing generative artificial intelligence (GAI)'s social-cognitive capabilities using dynamic audiovisual stimuli\",\"authors\":\"Elad Refoua , Zohar Elyoseph , Renata Wacker , Isabel Dziobek , Iftach Tsafrir , Gunther Meinlschmidt\",\"doi\":\"10.1016/j.chbr.2025.100702\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The integration of Generative Artificial Intelligence (GAI) into human social contexts has raised fundamental questions about machines' capacity to understand and respond to complex emotional and social dynamics. While recent studies have demonstrated GAI's promising capabilities in processing static emotional content, the frontier of dynamic social cognition – where multiple modalities converge to create naturalistic social scenarios – remained largely unexplored. This study advances our understanding by examining the social-cognitive capabilities of Google's Gemini 1.5 Pro model through its performance on the Movie for the Assessment of Social Cognition (MASC), a sophisticated instrument designed to evaluate mentalization abilities using dynamic audiovisual stimuli. We compared the model's performance to a human normative sample (N = 1230) across varying temperature settings (a parameter controlling the level of randomness in the AI's output, where lower values lead to more deterministic responses and higher values increase variability; set at 0, 0.5, and 1). Results revealed that Gemini 1.5 Pro consistently performed above chance across all conditions (all corrected <em>p</em>s < 0.001, Cohen's <em>h</em> range = 1.17–1.41) and significantly outperformed the human sample mean (<em>Z</em> = 2.24, <em>p</em> = .025; Glass's Δ = 0.92, 95 % CI [0.11, 1.72]; Hedges' <em>g</em> = 0.92, 95 % CI [0.12, 1.72]). Analysis of error patterns revealed a distribution between hyper-mentalizing (41.0 %; over-attribution of mental states), hypo-mentalizing (46.2 %; under-attribution of mental states), and non-mentalizing (12.8 %; failure to recognize mental states) errors. These findings extend our understanding of artificial social cognition to complex multimodal processing while raising important questions about the nature of machine-based social understanding. The implications span theoretical considerations in artificial Theory of Mind to practical applications in mental health care and social skills training, though careful consideration is warranted regarding the fundamental differences between human and artificial social cognitive processing.</div></div>\",\"PeriodicalId\":72681,\"journal\":{\"name\":\"Computers in human behavior reports\",\"volume\":\"19 \",\"pages\":\"Article 100702\"},\"PeriodicalIF\":5.8000,\"publicationDate\":\"2025-05-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers in human behavior reports\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2451958825001174\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHOLOGY, EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in human behavior reports","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2451958825001174","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

摘要

将生成式人工智能（GAI）整合到人类社会环境中，提出了关于机器理解和响应复杂情感和社会动态的能力的基本问题。虽然最近的研究已经证明了GAI在处理静态情感内容方面的潜力，但动态社会认知的前沿——多种模式汇聚在一起创造自然的社会场景——在很大程度上仍未被探索。本研究通过b谷歌的Gemini 1.5 Pro模型在电影社会认知评估（MASC）中的表现来检验其社会认知能力，这是一种旨在通过动态视听刺激来评估心智化能力的复杂仪器。我们将模型的性能与人类标准样本（N = 1230）在不同温度设置（控制人工智能输出随机性水平的参数）下进行了比较，其中较低的值导致更确定的响应，较高的值会增加可变性；设置为0、0.5和1)。结果显示，Gemini 1.5 Pro在所有条件下(所有校正后的ps <；0.001, Cohen’s h范围= 1.17-1.41)，显著优于人类样本均值(Z = 2.24, p = 0.025；Glass's Δ = 0.92, 95% CI [0.11, 1.72]；对冲系数g = 0.92, 95% CI[0.12, 1.72])。对错误模式的分析显示，过度精神化（41.0%）；过度归因的心理状态)，低心理化(46.2%；心理状态归因不足)和非心理化(12.8%；不能识别精神状态)错误。这些发现将我们对人工社会认知的理解扩展到复杂的多模态处理，同时提出了关于基于机器的社会理解本质的重要问题。尽管需要仔细考虑人类和人工社会认知加工之间的根本差异，但人工心理理论的理论考虑涉及到精神卫生保健和社会技能培训的实际应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

The next frontier in mindreading? Assessing generative artificial intelligence (GAI)'s social-cognitive capabilities using dynamic audiovisual stimuli

查看原文本刊更多论文

The next frontier in mindreading? Assessing generative artificial intelligence (GAI)'s social-cognitive capabilities using dynamic audiovisual stimuli

The integration of Generative Artificial Intelligence (GAI) into human social contexts has raised fundamental questions about machines' capacity to understand and respond to complex emotional and social dynamics. While recent studies have demonstrated GAI's promising capabilities in processing static emotional content, the frontier of dynamic social cognition – where multiple modalities converge to create naturalistic social scenarios – remained largely unexplored. This study advances our understanding by examining the social-cognitive capabilities of Google's Gemini 1.5 Pro model through its performance on the Movie for the Assessment of Social Cognition (MASC), a sophisticated instrument designed to evaluate mentalization abilities using dynamic audiovisual stimuli. We compared the model's performance to a human normative sample (N = 1230) across varying temperature settings (a parameter controlling the level of randomness in the AI's output, where lower values lead to more deterministic responses and higher values increase variability; set at 0, 0.5, and 1). Results revealed that Gemini 1.5 Pro consistently performed above chance across all conditions (all corrected ps < 0.001, Cohen's h range = 1.17–1.41) and significantly outperformed the human sample mean (Z = 2.24, p = .025; Glass's Δ = 0.92, 95 % CI [0.11, 1.72]; Hedges' g = 0.92, 95 % CI [0.12, 1.72]). Analysis of error patterns revealed a distribution between hyper-mentalizing (41.0 %; over-attribution of mental states), hypo-mentalizing (46.2 %; under-attribution of mental states), and non-mentalizing (12.8 %; failure to recognize mental states) errors. These findings extend our understanding of artificial social cognition to complex multimodal processing while raising important questions about the nature of machine-based social understanding. The implications span theoretical considerations in artificial Theory of Mind to practical applications in mental health care and social skills training, though careful consideration is warranted regarding the fundamental differences between human and artificial social cognitive processing.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers in human behavior reports Psychology (General)

CiteScore

7.80

自引率

0.00%

发文量