Towards the future of pedestrian–AV interaction: Human perception vs. LLM insights on Smart Pole Interaction Unit in shared spaces

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, CYBERNETICS

International Journal of Human-Computer Studies Pub Date : 2025-09-12 DOI:10.1016/j.ijhcs.2025.103628

Vishal Chauhan , Anubhav , Chia-Ming Chang , Xiang Su , Jin Nakazato , Ehsan Javanmardi , Alex Orsholits , Takeo Igarashi , Kantaro Fujiwara , Manabu Tsukada

{"title":"Towards the future of pedestrian–AV interaction: Human perception vs. LLM insights on Smart Pole Interaction Unit in shared spaces","authors":"Vishal Chauhan , Anubhav , Chia-Ming Chang , Xiang Su , Jin Nakazato , Ehsan Javanmardi , Alex Orsholits , Takeo Igarashi , Kantaro Fujiwara , Manabu Tsukada","doi":"10.1016/j.ijhcs.2025.103628","DOIUrl":null,"url":null,"abstract":"<div><div>As autonomous vehicles (AVs) reshape urban mobility, establishing effective communication between pedestrians and self-driving vehicles has become a critical safety imperative. This work investigates the integration of Smart Pole Interaction Units (SPIUs) as external human–machine interfaces (eHMIs) in shared spaces and introduces an innovative approach to enhance pedestrian–AV interactions. To provide subjective evidence on SPIU usability, we conduct a group design study (“Humans”) involving 25 participants (aged 18–40). We evaluate user preferences and interaction patterns using group discussion materials, revealing that 90% of the participants strongly prefer real-time multi-AV interactions facilitated by SPIU over conventional eHMI systems, where a pedestrian must look at multiple AVs individually. Furthermore, they emphasize inclusive design through multi-sensory communication channels—visual, auditory, and tactile signals—specifically addressing the needs of vulnerable road users (VRUs), including those with impairments. To complement these non-expert, real-world insights, we employ three leading Large Language Models (LLMs) (ChatGPT-4, Gemini-Pro, and Claude 3.5 Sonnet) as “experts” due to their extensive training data. Using the advantages of the multimodal vision-language processing capabilities of these LLMs, identical questions (text and images) used in human discussions are posed to generate text responses for pedestrian–AV interaction scenarios. Responses generated from LLMs and recorded conversations from human group discussions are used to extract the most frequent words. A keyword frequency analysis from both humans and LLMs is performed with three categories, Context, Safety, and Important. Our findings indicate that LLMs employ safety-related keywords 30% more frequently than human participants, suggesting a more structured, safety-centric approach. Among LLMs, ChatGPT-4 demonstrates superior response latency, Claude shows a closer alignment with human responses, and Gemini-Pro provides structured and contextually relevant insights. Our results from “Humans” and “LLMs” establish SPIU as a promising system for facilitating trust-building and safety-ensuring interactions among pedestrians, AVs, and delivery robots. Integrating diverse stakeholder feedback, we propose a prototype SPIU design to advance pedestrian–AV interactions in shared urban spaces, positioning SPIU as crucial infrastructure hubs for safe and trustworthy navigation.</div></div>","PeriodicalId":54955,"journal":{"name":"International Journal of Human-Computer Studies","volume":"205 ","pages":"Article 103628"},"PeriodicalIF":5.1000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Human-Computer Studies","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1071581925001855","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, CYBERNETICS","Score":null,"Total":0}

引用次数: 0

Abstract

As autonomous vehicles (AVs) reshape urban mobility, establishing effective communication between pedestrians and self-driving vehicles has become a critical safety imperative. This work investigates the integration of Smart Pole Interaction Units (SPIUs) as external human–machine interfaces (eHMIs) in shared spaces and introduces an innovative approach to enhance pedestrian–AV interactions. To provide subjective evidence on SPIU usability, we conduct a group design study (“Humans”) involving 25 participants (aged 18–40). We evaluate user preferences and interaction patterns using group discussion materials, revealing that 90% of the participants strongly prefer real-time multi-AV interactions facilitated by SPIU over conventional eHMI systems, where a pedestrian must look at multiple AVs individually. Furthermore, they emphasize inclusive design through multi-sensory communication channels—visual, auditory, and tactile signals—specifically addressing the needs of vulnerable road users (VRUs), including those with impairments. To complement these non-expert, real-world insights, we employ three leading Large Language Models (LLMs) (ChatGPT-4, Gemini-Pro, and Claude 3.5 Sonnet) as “experts” due to their extensive training data. Using the advantages of the multimodal vision-language processing capabilities of these LLMs, identical questions (text and images) used in human discussions are posed to generate text responses for pedestrian–AV interaction scenarios. Responses generated from LLMs and recorded conversations from human group discussions are used to extract the most frequent words. A keyword frequency analysis from both humans and LLMs is performed with three categories, Context, Safety, and Important. Our findings indicate that LLMs employ safety-related keywords 30% more frequently than human participants, suggesting a more structured, safety-centric approach. Among LLMs, ChatGPT-4 demonstrates superior response latency, Claude shows a closer alignment with human responses, and Gemini-Pro provides structured and contextually relevant insights. Our results from “Humans” and “LLMs” establish SPIU as a promising system for facilitating trust-building and safety-ensuring interactions among pedestrians, AVs, and delivery robots. Integrating diverse stakeholder feedback, we propose a prototype SPIU design to advance pedestrian–AV interactions in shared urban spaces, positioning SPIU as crucial infrastructure hubs for safe and trustworthy navigation.

Abstract Image

查看原文本刊更多论文

走向行人与自动驾驶互动的未来：人类感知与共享空间中智能杆互动单元的法学硕士见解

随着自动驾驶汽车（AVs）重塑城市交通，在行人和自动驾驶汽车之间建立有效的沟通已成为一项至关重要的安全要务。本研究探讨了智能杆交互单元（SPIUs）作为共享空间外部人机界面（eHMIs）的集成，并引入了一种创新的方法来增强行人与自动驾驶汽车的交互。为了提供SPIU可用性的主观证据，我们进行了一项涉及25名参与者（18-40岁）的群体设计研究（“人类”）。我们使用小组讨论材料评估用户偏好和交互模式，发现90%的参与者强烈喜欢SPIU提供的实时多自动驾驶汽车交互，而不是传统的eHMI系统，行人必须单独查看多辆自动驾驶汽车。此外，他们强调通过多感官沟通渠道（视觉、听觉和触觉信号）进行包容性设计，特别满足弱势道路使用者（包括残疾人）的需求。为了补充这些非专家的、真实世界的见解，我们采用了三个领先的大型语言模型（llm）（ChatGPT-4、Gemini-Pro和Claude 3.5 Sonnet）作为“专家”，因为他们有广泛的训练数据。利用这些llm的多模态视觉语言处理能力的优势，在人类讨论中使用的相同问题（文本和图像）被提出，以生成行人与自动驾驶汽车交互场景的文本响应。从法学硕士产生的回答和从人类小组讨论中记录的对话被用来提取最常见的单词。来自人类和法学硕士的关键字频率分析分为三个类别：上下文、安全和重要。我们的研究结果表明，法学硕士使用安全相关关键词的频率比人类参与者高30%，这表明一种更结构化、更以安全为中心的方法。在法学硕士中，ChatGPT-4显示出优越的响应延迟，Claude显示出与人类反应更接近的一致性，而Gemini-Pro提供结构化和上下文相关的见解。我们在“人类”和“法学硕士”的研究结果表明，SPIU是一个很有前途的系统，可以促进行人、自动驾驶汽车和送货机器人之间的信任建立和安全互动。综合不同利益相关者的反馈，我们提出了一个SPIU原型设计，以促进共享城市空间中行人与自动驾驶汽车的互动，将SPIU定位为安全可靠导航的关键基础设施枢纽。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Human-Computer Studies 工程技术-计算机：控制论

CiteScore

11.50

自引率

5.60%

发文量

108

审稿时长

3 months

期刊介绍： The International Journal of Human-Computer Studies publishes original research over the whole spectrum of work relevant to the theory and practice of innovative interactive systems. The journal is inherently interdisciplinary, covering research in computing, artificial intelligence, psychology, linguistics, communication, design, engineering, and social organization, which is relevant to the design, analysis, evaluation and application of innovative interactive systems. Papers at the boundaries of these disciplines are especially welcome, as it is our view that interdisciplinary approaches are needed for producing theoretical insights in this complex area and for effective deployment of innovative technologies in concrete user communities. Research areas relevant to the journal include, but are not limited to: • Innovative interaction techniques • Multimodal interaction • Speech interaction • Graphic interaction • Natural language interaction • Interaction in mobile and embedded systems • Interface design and evaluation methodologies • Design and evaluation of innovative interactive systems • User interface prototyping and management systems • Ubiquitous computing • Wearable computers • Pervasive computing • Affective computing • Empirical studies of user behaviour • Empirical studies of programming and software engineering • Computer supported cooperative work • Computer mediated communication • Virtual reality • Mixed and augmented Reality • Intelligent user interfaces • Presence ...