Beyond Visual Perception: Insights from Smartphone Interaction of Visually Impaired Users with Large Multimodal Models.

Jingyi Xie, Rui Yu, H E Zhang, Syed Masum Billah, Sooyeon Lee, John M Carroll
{"title":"Beyond Visual Perception: Insights from Smartphone Interaction of Visually Impaired Users with Large Multimodal Models.","authors":"Jingyi Xie, Rui Yu, H E Zhang, Syed Masum Billah, Sooyeon Lee, John M Carroll","doi":"10.1145/3706598.3714210","DOIUrl":null,"url":null,"abstract":"<p><p>Large multimodal models (LMMs) have enabled new AI-powered applications that help people with visual impairments (PVI) receive natural language descriptions of their surroundings through audible text. We investigated how this emerging paradigm of visual assistance transforms how PVI perform and manage their daily tasks. Moving beyond basic usability assessments, we examined both the capabilities and limitations of LMM-based tools in personal and social contexts, while exploring design implications for their future development. Through interviews with 14 visually impaired users and analysis of image descriptions from both participants and social media using Be My AI (an LMM-based application), we identified two key limitations. First, these systems' context awareness suffers from hallucinations and misinterpretations of social contexts, styles, and human identities. Second, their intent-oriented capabilities often fail to grasp and act on users' intentions. Based on these findings, we propose design strategies for improving both human-AI and AI-AI interactions, contributing to the development of more effective, interactive, and personalized assistive technologies.</p>","PeriodicalId":74552,"journal":{"name":"Proceedings of the SIGCHI conference on human factors in computing systems. CHI Conference","volume":"25 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12338113/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the SIGCHI conference on human factors in computing systems. CHI Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3706598.3714210","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/25 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Large multimodal models (LMMs) have enabled new AI-powered applications that help people with visual impairments (PVI) receive natural language descriptions of their surroundings through audible text. We investigated how this emerging paradigm of visual assistance transforms how PVI perform and manage their daily tasks. Moving beyond basic usability assessments, we examined both the capabilities and limitations of LMM-based tools in personal and social contexts, while exploring design implications for their future development. Through interviews with 14 visually impaired users and analysis of image descriptions from both participants and social media using Be My AI (an LMM-based application), we identified two key limitations. First, these systems' context awareness suffers from hallucinations and misinterpretations of social contexts, styles, and human identities. Second, their intent-oriented capabilities often fail to grasp and act on users' intentions. Based on these findings, we propose design strategies for improving both human-AI and AI-AI interactions, contributing to the development of more effective, interactive, and personalized assistive technologies.

超越视觉感知:视觉受损用户与大型多模态模型的智能手机交互的见解。
大型多模态模型(lmm)使新的人工智能应用程序能够帮助视障人士(PVI)通过可听文本接收对周围环境的自然语言描述。我们研究了这种新兴的视觉辅助模式如何改变PVI执行和管理日常任务的方式。除了基本的可用性评估之外,我们还研究了基于lmm的工具在个人和社会环境中的能力和局限性,同时探索了它们未来发展的设计含义。通过对14名视障用户的访谈,以及使用“Be My AI”(一款基于lm的应用程序)对参与者和社交媒体的图像描述进行分析,我们发现了两个关键的限制。首先,这些系统的情境意识会对社会情境、风格和人类身份产生幻觉和误解。其次,他们的面向意图的能力往往不能把握和行动用户的意图。基于这些发现,我们提出了改善人机交互和人工智能交互的设计策略,有助于开发更有效、更互动、更个性化的辅助技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信