Answering visual questions with conversational crowd assistants

Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility Pub Date : 2013-10-21 DOI:10.1145/2513383.2517033

Walter S. Lasecki, Phyo Thiha, Yu Zhong, Erin L. Brady, Jeffrey P. Bigham

{"title":"Answering visual questions with conversational crowd assistants","authors":"Walter S. Lasecki, Phyo Thiha, Yu Zhong, Erin L. Brady, Jeffrey P. Bigham","doi":"10.1145/2513383.2517033","DOIUrl":null,"url":null,"abstract":"Blind people face a range of accessibility challenges in their everyday lives, from reading the text on a package of food to traveling independently in a new place. Answering general questions about one's visual surroundings remains well beyond the capabilities of fully automated systems, but recent systems are showing the potential of engaging on-demand human workers (the crowd) to answer visual questions. The input to such systems has generally been a single image, which can limit the interaction with a worker to one question; or video streams where systems have paired the end user with a single worker, limiting the benefits of the crowd. In this paper, we introduce Chorus:View, a system that assists users over the course of longer interactions by engaging workers in a continuous conversation with the user about a video stream from the user's mobile device. We demonstrate the benefit of using multiple crowd workers instead of just one in terms of both latency and accuracy, then conduct a study with 10 blind users that shows Chorus:View answers common visual questions more quickly and accurately than existing approaches. We conclude with a discussion of users' feedback and potential future work on interactive crowd support of blind users.","PeriodicalId":378932,"journal":{"name":"Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"79","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2513383.2517033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 79

Abstract

Blind people face a range of accessibility challenges in their everyday lives, from reading the text on a package of food to traveling independently in a new place. Answering general questions about one's visual surroundings remains well beyond the capabilities of fully automated systems, but recent systems are showing the potential of engaging on-demand human workers (the crowd) to answer visual questions. The input to such systems has generally been a single image, which can limit the interaction with a worker to one question; or video streams where systems have paired the end user with a single worker, limiting the benefits of the crowd. In this paper, we introduce Chorus:View, a system that assists users over the course of longer interactions by engaging workers in a continuous conversation with the user about a video stream from the user's mobile device. We demonstrate the benefit of using multiple crowd workers instead of just one in terms of both latency and accuracy, then conduct a study with 10 blind users that shows Chorus:View answers common visual questions more quickly and accurately than existing approaches. We conclude with a discussion of users' feedback and potential future work on interactive crowd support of blind users.

查看原文本刊更多论文

与对话人群助手一起回答视觉问题

盲人在日常生活中面临着一系列无障碍挑战，从阅读食品包装上的文字到在一个新地方独立旅行。回答关于一个人的视觉环境的一般问题仍然远远超出了全自动系统的能力，但最近的系统显示出了让按需工作人员(人群)回答视觉问题的潜力。这些系统的输入通常是一个单一的图像，这可以将与工作人员的交互限制为一个问题;或者视频流，系统将最终用户与单个工作人员配对，限制了人群的好处。在本文中，我们介绍了Chorus:View，这是一个系统，通过让工作人员与用户就来自用户移动设备的视频流进行持续对话，帮助用户进行更长时间的交互。我们展示了在延迟和准确性方面使用多个人群工作人员而不仅仅是一个人的好处，然后对10名盲人用户进行了一项研究，表明Chorus:View比现有方法更快、更准确地回答了常见的视觉问题。最后，我们讨论了用户的反馈和未来可能在盲人用户的互动群体支持方面的工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility

自引率

0.00%

发文量