GPT-4V shows human-like social perceptual capabilities at phenomenological and neural levels.

Imaging neuroscience (Cambridge, Mass.) Pub Date : 2025-09-02 eCollection Date: 2025-01-01 DOI:10.1162/IMAG.a.134
Severi Santavirta, Yuhang Wu, Lauri Suominen, Lauri Nummenmaa
{"title":"GPT-4V shows human-like social perceptual capabilities at phenomenological and neural levels.","authors":"Severi Santavirta, Yuhang Wu, Lauri Suominen, Lauri Nummenmaa","doi":"10.1162/IMAG.a.134","DOIUrl":null,"url":null,"abstract":"<p><p>Humans navigate the social world by rapidly perceiving social features from other people and their interaction. Recently, large-language models (LLMs) have achieved high-level visual capabilities for detailed object and scene content recognition and description. This raises the question whether LLMs can infer complex social information from images and videos, and whether the high-dimensional structure of the feature annotations aligns with that of humans. We collected evaluations for 138 social features from GPT-4V for images (N = 468) and videos (N = 234) that are derived from social movie scenes. These evaluations were compared with human evaluations (N = 2,254). The comparisons established that GPT-4V can achieve human-like capabilities at annotating individual social features. The GPT-4V social feature annotations also express similar structural representation compared to the human social perceptual structure (i.e., similar correlation matrix over all social feature annotations). Finally, we modeled hemodynamic responses (N = 97) to viewing socioemotional movie clips with feature annotations by human observers and GPT-4V. These results demonstrated that GPT-4V based stimulus models can also reveal the social perceptual network in the human brain highly similar to the stimulus models based on human annotations. These human-like annotation capabilities of LLMs could have a wide range of real-life applications ranging from health care to business and would open exciting new avenues for psychological and neuroscientific research.</p>","PeriodicalId":73341,"journal":{"name":"Imaging neuroscience (Cambridge, Mass.)","volume":"3 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12410153/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Imaging neuroscience (Cambridge, Mass.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1162/IMAG.a.134","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Humans navigate the social world by rapidly perceiving social features from other people and their interaction. Recently, large-language models (LLMs) have achieved high-level visual capabilities for detailed object and scene content recognition and description. This raises the question whether LLMs can infer complex social information from images and videos, and whether the high-dimensional structure of the feature annotations aligns with that of humans. We collected evaluations for 138 social features from GPT-4V for images (N = 468) and videos (N = 234) that are derived from social movie scenes. These evaluations were compared with human evaluations (N = 2,254). The comparisons established that GPT-4V can achieve human-like capabilities at annotating individual social features. The GPT-4V social feature annotations also express similar structural representation compared to the human social perceptual structure (i.e., similar correlation matrix over all social feature annotations). Finally, we modeled hemodynamic responses (N = 97) to viewing socioemotional movie clips with feature annotations by human observers and GPT-4V. These results demonstrated that GPT-4V based stimulus models can also reveal the social perceptual network in the human brain highly similar to the stimulus models based on human annotations. These human-like annotation capabilities of LLMs could have a wide range of real-life applications ranging from health care to business and would open exciting new avenues for psychological and neuroscientific research.

GPT-4V在现象学和神经层面显示出与人类相似的社会感知能力。
人类通过快速感知他人的社会特征以及他们之间的互动来驾驭社会世界。最近,大语言模型(llm)已经实现了详细对象和场景内容识别和描述的高级视觉能力。这就提出了一个问题,llm是否可以从图像和视频中推断出复杂的社会信息,以及特征注释的高维结构是否与人类的结构一致。我们收集了GPT-4V对来自社交电影场景的图像(N = 468)和视频(N = 234)的138个社交特征的评估。将这些评价与人类评价进行比较(N = 2254)。这些比较表明,GPT-4V可以在注释个人社交特征方面实现类似人类的能力。GPT-4V社会特征标注也表达了与人类社会感知结构相似的结构表征(即所有社会特征标注上相似的关联矩阵)。最后,我们用人类观察者和GPT-4V的特征注释模拟了观看社会情感电影片段时的血流动力学反应(N = 97)。这些结果表明,基于GPT-4V的刺激模型也可以揭示人脑的社会知觉网络,与基于人类注释的刺激模型高度相似。法学硕士的这些类似人类的注释能力可以在现实生活中有广泛的应用,从医疗保健到商业,并将为心理学和神经科学研究开辟令人兴奋的新途径。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信