Jingyi Xie, Rui Yu, H E Zhang, Syed Masum Billah, Sooyeon Lee, John M Carroll
{"title":"超越视觉感知:视觉受损用户与大型多模态模型的智能手机交互的见解。","authors":"Jingyi Xie, Rui Yu, H E Zhang, Syed Masum Billah, Sooyeon Lee, John M Carroll","doi":"10.1145/3706598.3714210","DOIUrl":null,"url":null,"abstract":"<p><p>Large multimodal models (LMMs) have enabled new AI-powered applications that help people with visual impairments (PVI) receive natural language descriptions of their surroundings through audible text. We investigated how this emerging paradigm of visual assistance transforms how PVI perform and manage their daily tasks. Moving beyond basic usability assessments, we examined both the capabilities and limitations of LMM-based tools in personal and social contexts, while exploring design implications for their future development. Through interviews with 14 visually impaired users and analysis of image descriptions from both participants and social media using Be My AI (an LMM-based application), we identified two key limitations. First, these systems' context awareness suffers from hallucinations and misinterpretations of social contexts, styles, and human identities. Second, their intent-oriented capabilities often fail to grasp and act on users' intentions. Based on these findings, we propose design strategies for improving both human-AI and AI-AI interactions, contributing to the development of more effective, interactive, and personalized assistive technologies.</p>","PeriodicalId":74552,"journal":{"name":"Proceedings of the SIGCHI conference on human factors in computing systems. CHI Conference","volume":"25 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12338113/pdf/","citationCount":"0","resultStr":"{\"title\":\"Beyond Visual Perception: Insights from Smartphone Interaction of Visually Impaired Users with Large Multimodal Models.\",\"authors\":\"Jingyi Xie, Rui Yu, H E Zhang, Syed Masum Billah, Sooyeon Lee, John M Carroll\",\"doi\":\"10.1145/3706598.3714210\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Large multimodal models (LMMs) have enabled new AI-powered applications that help people with visual impairments (PVI) receive natural language descriptions of their surroundings through audible text. We investigated how this emerging paradigm of visual assistance transforms how PVI perform and manage their daily tasks. Moving beyond basic usability assessments, we examined both the capabilities and limitations of LMM-based tools in personal and social contexts, while exploring design implications for their future development. Through interviews with 14 visually impaired users and analysis of image descriptions from both participants and social media using Be My AI (an LMM-based application), we identified two key limitations. First, these systems' context awareness suffers from hallucinations and misinterpretations of social contexts, styles, and human identities. Second, their intent-oriented capabilities often fail to grasp and act on users' intentions. Based on these findings, we propose design strategies for improving both human-AI and AI-AI interactions, contributing to the development of more effective, interactive, and personalized assistive technologies.</p>\",\"PeriodicalId\":74552,\"journal\":{\"name\":\"Proceedings of the SIGCHI conference on human factors in computing systems. CHI Conference\",\"volume\":\"25 \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12338113/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the SIGCHI conference on human factors in computing systems. CHI Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3706598.3714210\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/4/25 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the SIGCHI conference on human factors in computing systems. CHI Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3706598.3714210","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/25 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
大型多模态模型(lmm)使新的人工智能应用程序能够帮助视障人士(PVI)通过可听文本接收对周围环境的自然语言描述。我们研究了这种新兴的视觉辅助模式如何改变PVI执行和管理日常任务的方式。除了基本的可用性评估之外,我们还研究了基于lmm的工具在个人和社会环境中的能力和局限性,同时探索了它们未来发展的设计含义。通过对14名视障用户的访谈,以及使用“Be My AI”(一款基于lm的应用程序)对参与者和社交媒体的图像描述进行分析,我们发现了两个关键的限制。首先,这些系统的情境意识会对社会情境、风格和人类身份产生幻觉和误解。其次,他们的面向意图的能力往往不能把握和行动用户的意图。基于这些发现,我们提出了改善人机交互和人工智能交互的设计策略,有助于开发更有效、更互动、更个性化的辅助技术。
Beyond Visual Perception: Insights from Smartphone Interaction of Visually Impaired Users with Large Multimodal Models.
Large multimodal models (LMMs) have enabled new AI-powered applications that help people with visual impairments (PVI) receive natural language descriptions of their surroundings through audible text. We investigated how this emerging paradigm of visual assistance transforms how PVI perform and manage their daily tasks. Moving beyond basic usability assessments, we examined both the capabilities and limitations of LMM-based tools in personal and social contexts, while exploring design implications for their future development. Through interviews with 14 visually impaired users and analysis of image descriptions from both participants and social media using Be My AI (an LMM-based application), we identified two key limitations. First, these systems' context awareness suffers from hallucinations and misinterpretations of social contexts, styles, and human identities. Second, their intent-oriented capabilities often fail to grasp and act on users' intentions. Based on these findings, we propose design strategies for improving both human-AI and AI-AI interactions, contributing to the development of more effective, interactive, and personalized assistive technologies.