Sherif Abdelkarim , David Lu , Dora-Luz Flores , Susanne Jaeggi , Pierre Baldi
{"title":"Evaluating the Intelligence of large language models: A comparative study using verbal and visual IQ tests","authors":"Sherif Abdelkarim , David Lu , Dora-Luz Flores , Susanne Jaeggi , Pierre Baldi","doi":"10.1016/j.chbah.2025.100170","DOIUrl":"10.1016/j.chbah.2025.100170","url":null,"abstract":"<div><div>Large language models (LLMs) excel on many specialized benchmarks, yet their general-reasoning ability remains opaque. We therefore test 18 models – including GPT-4, Claude 3 and Gemini Pro – on a 14-section IQ suite spanning verbal, numerical and visual puzzles and add a “multi-agent reflection” variant in which one model answers while others critique and revise. Results replicate known patterns: a strong bias towards verbal vs numerical reasoning (GPT-4: 79% vs 53% accuracy), a pronounced modality gap (text-IQ <span><math><mo>≈</mo></math></span> 125 vs visual-IQ <span><math><mo>≈</mo></math></span> 103), and persistent failure on abstract arithmetic (<span><math><mo>≤</mo></math></span> 20% on missing-number tasks). Scaling lifts mean IQ from 89 (tiny models) to 131 (large models), but gains are non-uniform, and reflection yields only modest extra points for frontier systems. Our contributions include: (1) proposing an evaluation framework for LLM “intelligence” using both verbal and visual IQ tasks, (2) analyzing how multi-agent setups with varying actor and critic sizes affect problem-solving performance; (3) analyzing how model size and multi-modality affect performance across diverse reasoning tasks; and (4) highlighting the value of IQ tests as a standardized, human-referenced benchmark that enables longitudinal comparison of LLMs’ cognitive abilities relative to human norms. We further discuss the limitations of IQ tests as an AI benchmark and outline directions for more comprehensive evaluation of LLM reasoning capabilities.</div></div>","PeriodicalId":100324,"journal":{"name":"Computers in Human Behavior: Artificial Humans","volume":"5 ","pages":"Article 100170"},"PeriodicalIF":0.0,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144470648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marie Hornberger , Arne Bewersdorff , Daniel S. Schiff , Claudia Nerdel
{"title":"Development and validation of a short AI literacy test (AILIT-S) for university students","authors":"Marie Hornberger , Arne Bewersdorff , Daniel S. Schiff , Claudia Nerdel","doi":"10.1016/j.chbah.2025.100176","DOIUrl":"10.1016/j.chbah.2025.100176","url":null,"abstract":"<div><div>Fostering AI literacy is an important goal in higher education in many disciplines. Assessing AI literacy can inform researchers and educators on current AI literacy levels and provide insights into the effectiveness of learning and teaching in the field of AI. It can also inform decision-makers and policymakers about the successes and gaps with respect to AI literacy within certain institutions, populations, or countries, for example. However, most of the available AI literacy tests are quite long and time-consuming. A short test of AI literacy would instead enable efficient measurement and facilitate better research and understanding. In this study, we develop and validate a short version of an existing validated AI literacy test. Based on a sample of 1,465 university students across three Western countries (Germany, UK, US), we select a subset of items according to content validity, coverage of different difficulty levels, and ability to discriminate between participants. The resulting short version, AILIT-S, consists of 10 items and can be used to assess AI literacy in under 5 minutes. While the shortened test is less reliable than the long version, it maintains high construct validity and has high congruent validity. We offer recommendations for researchers and practitioners on when to use the long or short version.</div></div>","PeriodicalId":100324,"journal":{"name":"Computers in Human Behavior: Artificial Humans","volume":"5 ","pages":"Article 100176"},"PeriodicalIF":0.0,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144470639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antonella D'Amico , Giuseppina Paci , Laura di Domenico , Alessandro Geraci
{"title":"Using educational robotics to support motor, cognitive, and social skills in a child with spinal muscular atrophy. A single-case study","authors":"Antonella D'Amico , Giuseppina Paci , Laura di Domenico , Alessandro Geraci","doi":"10.1016/j.chbah.2025.100175","DOIUrl":"10.1016/j.chbah.2025.100175","url":null,"abstract":"<div><div>This study reports the results of a single-case intervention involving a child with spinal muscular atrophy. The aim of the study was to promote fine motor skills, visual-motor integration, attentional behaviors, and learning. The treatment was based on the RE4BES protocol, which consists of a set of guidelines for conducting tailored educational robotics activities designed for children with special needs. We employed an experimental single-case ABA design, including Baseline 1 (A1), Treatment (B), and Baseline 2 (A2), with eight sessions per phase. The treatment phase involved activities with Blue-Bot and LEGO® WeDo 2.0. Results showed significant improvements in gross and fine motor skills from baseline to the treatment phase, with these gains maintained after the intervention. Moreover, in alignment with the main goals of school inclusion for people with special needs, results demonstrated that the intervention also improved awareness, flexibility, cooperation, and initiative within the classroom. Despite the study's limitations, the findings support the effectiveness of the RE4BES protocol and suggest that educational robotics can be a valuable tool in special education settings.</div></div>","PeriodicalId":100324,"journal":{"name":"Computers in Human Behavior: Artificial Humans","volume":"5 ","pages":"Article 100175"},"PeriodicalIF":0.0,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144288715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Increased morality through social communication or decision situation worsens the acceptance of robo-advisors","authors":"Clarissa Sabrina Arlinghaus , Carolin Straßmann , Annika Dix","doi":"10.1016/j.chbah.2025.100173","DOIUrl":"10.1016/j.chbah.2025.100173","url":null,"abstract":"<div><div>This German study (<em>N</em> = 317) tests social communication (i.e., self-disclosure, content intimacy, relational continuity units, we-phrases) as a potential compensation strategy for algorithm aversion. Therefore, we explore the acceptance of a robot as an advisor in non-moral, somewhat moral, and very moral decision situations and compare the influence of two verbal communication styles of the robot (functional vs. social).</div><div>Subjects followed the robot's recommendation similarly often for both communication styles (functional vs. social), but more often in the non-moral decision situation than in the moral decision situations. Subjects perceived the robot as more human and more moral during social communication than during functional communication but similarly trustworthy, likable, and intelligent for both communication styles. In moral decision situations, subjects ascribed more anthropomorphism and morality but less trust, likability, and intelligence to the robot compared to the non-moral decision situation.</div><div>Subjects perceive the robot as more moral in social communication. This unexpectedly led to subjects following the robot's recommendation less often. No other mediation effects were found. From this we conclude, that the verbal communication style alone has a rather small influence on the robot's acceptance as an advisor for moral decision-making and does not reduce algorithm aversion. Potential reasons for this (e.g., multimodality, no visual changes), as well as implications (e.g., avoidance of self-disclosure in human-robot interaction) and limitations (e.g., video interaction) of this study, are discussed.</div></div>","PeriodicalId":100324,"journal":{"name":"Computers in Human Behavior: Artificial Humans","volume":"5 ","pages":"Article 100173"},"PeriodicalIF":0.0,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144263624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Of love & lasers: Perceptions of narratives by AI versus human authors","authors":"Gavin Raffloer, Melanie C Green","doi":"10.1016/j.chbah.2025.100168","DOIUrl":"10.1016/j.chbah.2025.100168","url":null,"abstract":"<div><div>Artificial Intelligence (AI) programs can produce narratives. However, readers' preconceptions about AI may influence their response to these narratives, and furthermore, AI-generated writing may differ from human writing. Genre may also be relevant for readers’ attitudes regarding AI. This study tests the effects of actual AI versus human authorship, stated (labeled) authorship, and genre on perceptions of narratives and narrative engagement. Participants were randomly assigned within a 2 (actual author: human or AI) X 2 (stated author: human or AI) X 2 (genre: romance or science fiction) design, across two studies. In Study 1, actual AI narratives were perceived as more enjoyable, but human narratives were more appreciated. Furthermore, participants enjoyed actual AI-written sci-fi more than human-written sci-fi. Study 2 found that actual AI stories were rated more highly, particularly in appreciation, transportation, character identification, and future engagement. However, stated human authorship led to higher ratings for romance, but not for sci-fi. An interaction was observed such that for the sci-fi condition, stated human writing was perceived as more likely to be actually AI-written. Future research could expand upon these findings across more genres, as well as examining the determinants of preferences for stated human content.</div></div>","PeriodicalId":100324,"journal":{"name":"Computers in Human Behavior: Artificial Humans","volume":"5 ","pages":"Article 100168"},"PeriodicalIF":0.0,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144263722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"To Be competitive or not to be competitive: How performance goals shape human-AI and human-human collaboration","authors":"Spatola Nicolas","doi":"10.1016/j.chbah.2025.100169","DOIUrl":"10.1016/j.chbah.2025.100169","url":null,"abstract":"<div><div>Due to generative AI, and particularly algorithms using large language models, people's use of algorithms as recommendation tools is increasing at an unprecedented pace. While these tools are used in both private and work contexts, less is known about how the motivational context surrounding algorithm use impacts reliance patterns. This research examined how competitive versus non-performance goals affect adherence to algorithmic versus human recommendation. In Experiment 1, participants completed Raven's Matrices with optional algorithm assistance. Framing the task as a competitive test increased reliance on the algorithm compared to a control condition. This effect was mediated by heightened perceived usefulness but not accuracy. Experiment 2 introduced human assistance alongside the algorithm assistance from Experiment 1. Performance (compared to control) goals increased reliance on the algorithm over peer assistance by selectively enhancing the perceived usefulness of the algorithm versus human assistance. These results demonstrate how setting goals may influence the preference to rely on algorithmic or human assistance and particularly how performance goal contexts catalyze a situation in which participants are more prone to rely on algorithms compared to peer recommendation. These results are discussed with regard to social goals and social cognition in competitive settings with the aim of elucidating how motivational framing shapes human-AI collaborative dynamics, informing responsible system design.</div></div>","PeriodicalId":100324,"journal":{"name":"Computers in Human Behavior: Artificial Humans","volume":"5 ","pages":"Article 100169"},"PeriodicalIF":0.0,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144241902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Artificial social influence via human-embodied AI agent interaction in immersive virtual reality (VR): Effects of similarity-matching during health conversations","authors":"Sue Lim, Ralf Schmälzle, Gary Bente","doi":"10.1016/j.chbah.2025.100172","DOIUrl":"10.1016/j.chbah.2025.100172","url":null,"abstract":"<div><div>Interactions with artificial intelligence (AI) based agents can positively influence human behavior and judgment. However, studies to date focus on text-based conversational agents (CA) with limited embodiment, restricting our understanding of how social influence principles, such as physical similarity, apply to AI agents (i.e., artificial social influence). We address this gap by leveraging latest advances in AI (large language models) and combining them with immersive virtual reality (VR). Specifically, we built VR-ECAs, or embodied conversational agents that can engage in turn-taking conversations with humans about health-related topics in a virtual environment. Then we manipulated interpersonal similarity via gender matching and examined its effects on biobehavioral (i.e., gaze), social (e.g., agent likeability), and behavioral outcomes (i.e., healthy snack selection). We observed an interaction effect between agent and participant gender on biobehavioral outcomes: discussing health with opposite-gender agents tended to enhance gaze duration, with the effect stronger for male participants compared to their female counterparts. A similar directional pattern was observed for healthy snack selection. In addition, female participants liked the VR-ECAs more than their male counterparts, regardless of the VR-ECAs’ gender. Finally, participants experienced greater presence while conversing with embodied agents than chatting with text-only agents. Overall, our findings highlight embodiment as a crucial factor of AI's influence on human behavior, and our paradigm enables new experimental research at the intersection of social influence, human-AI communication, and immersive virtual reality (VR).</div></div>","PeriodicalId":100324,"journal":{"name":"Computers in Human Behavior: Artificial Humans","volume":"5 ","pages":"Article 100172"},"PeriodicalIF":0.0,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144270226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Steeven Villa , Lisa L. Barth , Francesco Chiossi , Robin Welsch , Thomas Kosch
{"title":"Whose mind is it anyway? A systematic review and exploration on agency in cognitive augmentation","authors":"Steeven Villa , Lisa L. Barth , Francesco Chiossi , Robin Welsch , Thomas Kosch","doi":"10.1016/j.chbah.2025.100158","DOIUrl":"10.1016/j.chbah.2025.100158","url":null,"abstract":"<div><div>Technologies for human augmentation aim to enhance sensory, motor, and cognitive abilities. Despite the growing interest in cognitive augmentation, the sense of agency and the feeling of control over one’s actions and outcomes remained underexplored. We conducted a systematic literature review, screening 434 human–computer Interaction articles, and identified 27 papers examining agency in cognitive augmentation. Our analysis revealed a lack of objective methods to measure the sense of agency. We analyzed Electroencephalography (EEG) data of a dataset from 27 participants performing a Columbia Card Task with and without perceived AI assistance to address this research gap. We observed changes in EEG data for alpha and low-beta power, demonstrating EEG as a measure of perceived cognitive agency. These findings demonstrate how EEG can quantify perceived agency, presenting a method to evaluate the impact of cognitive augmentation technologies on the sense of agency. This study not only provides a novel neurophysiological approach for assessing the impact of cognitive augmentation technologies on agency but also leads the way to designing interfaces that create user awareness regarding their sense of agency.</div></div>","PeriodicalId":100324,"journal":{"name":"Computers in Human Behavior: Artificial Humans","volume":"5 ","pages":"Article 100158"},"PeriodicalIF":0.0,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144470649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Educational robotics: Parental views of telepresence robots as social and academic support for children undergoing cancer treatment in Denmark","authors":"Emilie Løvenstein Vegeberg , Mette Weibel Willard , Mads Lund Andersen , Lykke Brogaard Bertel , Hanne Bækgaard Larsen","doi":"10.1016/j.chbah.2025.100164","DOIUrl":"10.1016/j.chbah.2025.100164","url":null,"abstract":"<div><div>Disrupted school attendance can trigger social and academic setbacks in children with prolonged illness. This study explores parental perspectives of telepresence robots in facilitating social and academic inclusion of their children undergoing cancer treatment. Parents (n = 15) of school-aged children with cancer (n = 15) in Denmark participated in semi-structured interviews between November 2022 and July 2023. An abductive approach was used, based on thematic analysis and the Agential Realism theory. The analyses were structured around five themes: 1) multifaceted responsibilities and roles; 2) aid or burden; 3) robot personification; 4) social connectivity; and 5) educational support. From a parental perspective, telepresence robots can support regular school attendance in children with cancer, classmate interactions and facilitate information sharing about teaching content. Conversely, telepresence robots can impose an additional burden on parents of children with cancer including responsibility for facilitating robot use while lacking surplus resources otherwise dedicated to the sick child. This study corroborates the potential of telepresence robots to provide social and academic support for children undergoing treatment, thereby alleviating the burden faced by their parents.</div></div>","PeriodicalId":100324,"journal":{"name":"Computers in Human Behavior: Artificial Humans","volume":"5 ","pages":"Article 100164"},"PeriodicalIF":0.0,"publicationDate":"2025-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144241773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Experimental evaluation of cognitive agents for collaboration in human-autonomy cyber defense teams","authors":"Yinuo Du , Baptiste Prébot , Tyler Malloy , Fei Fang , Cleotilde Gonzalez","doi":"10.1016/j.chbah.2025.100148","DOIUrl":"10.1016/j.chbah.2025.100148","url":null,"abstract":"<div><div>Autonomous agents are becoming increasingly prevalent and capable of collaborating with humans on interdependent tasks as teammates. There is increasing recognition that human-like agents might be natural human collaborators. However, there has been limited work on designing agents according to the principles of human cognition or in empirically testing their teamwork effectiveness. In this study, we introduce the Team Defense Game (TDG), a novel experimental platform for investigating human-autonomy teaming in cyber defense scenarios. We design an agent that relies on episodic memory to determine its actions (<em>Cognitive agent</em>) and compare its effectiveness with two types of autonomous agents: one that relies on heuristic reasoning (<em>Heuristic agent</em>) and one that behaves randomly (<em>Random agent</em>). These agents are compared in a human-autonomy team (HAT) performing a cyber-protection task in the TDG. We systematically evaluate how autonomous teammates’ abilities and competence impact the team’s interaction and outcomes. The results revealed that teams with Cognitive agents are the most effective partners, followed by teams with Heuristic and Random agents. Evaluation of collaborative team process metrics suggests that the cognitive agent is more adaptive to individual play styles of human teammates, but it is also inconsistent and less predictable than the Heuristic agent. Competent agents (Cognitive and Heuristic agents) require less human effort but might cause over-reliance. A post-experiment questionnaire showed that competent agents are rated more trustworthy and cooperative than Random agents. We also found that human participants’ subjective ratings correlate with their team performance, and humans tend to take the credit or responsibility for the team. Our work advances HAT research by providing empirical evidence of how the design of different autonomous agents (cognitive, heuristic, and random) affect team performance and dynamics in cybersecurity contexts. We propose that autonomous agents for HATs should possess both competence and human-like cognition while also ensuring predictable behavior or clear explanations to maintain human trust. Additionally, they should proactively seek human input to enhance teamwork effectiveness.</div></div>","PeriodicalId":100324,"journal":{"name":"Computers in Human Behavior: Artificial Humans","volume":"4 ","pages":"Article 100148"},"PeriodicalIF":0.0,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143891973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}