Walter S. Lasecki, Phyo Thiha, Yu Zhong, Erin L. Brady, Jeffrey P. Bigham
{"title":"与对话人群助手一起回答视觉问题","authors":"Walter S. Lasecki, Phyo Thiha, Yu Zhong, Erin L. Brady, Jeffrey P. Bigham","doi":"10.1145/2513383.2517033","DOIUrl":null,"url":null,"abstract":"Blind people face a range of accessibility challenges in their everyday lives, from reading the text on a package of food to traveling independently in a new place. Answering general questions about one's visual surroundings remains well beyond the capabilities of fully automated systems, but recent systems are showing the potential of engaging on-demand human workers (the crowd) to answer visual questions. The input to such systems has generally been a single image, which can limit the interaction with a worker to one question; or video streams where systems have paired the end user with a single worker, limiting the benefits of the crowd. In this paper, we introduce Chorus:View, a system that assists users over the course of longer interactions by engaging workers in a continuous conversation with the user about a video stream from the user's mobile device. We demonstrate the benefit of using multiple crowd workers instead of just one in terms of both latency and accuracy, then conduct a study with 10 blind users that shows Chorus:View answers common visual questions more quickly and accurately than existing approaches. We conclude with a discussion of users' feedback and potential future work on interactive crowd support of blind users.","PeriodicalId":378932,"journal":{"name":"Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"79","resultStr":"{\"title\":\"Answering visual questions with conversational crowd assistants\",\"authors\":\"Walter S. Lasecki, Phyo Thiha, Yu Zhong, Erin L. Brady, Jeffrey P. Bigham\",\"doi\":\"10.1145/2513383.2517033\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Blind people face a range of accessibility challenges in their everyday lives, from reading the text on a package of food to traveling independently in a new place. Answering general questions about one's visual surroundings remains well beyond the capabilities of fully automated systems, but recent systems are showing the potential of engaging on-demand human workers (the crowd) to answer visual questions. The input to such systems has generally been a single image, which can limit the interaction with a worker to one question; or video streams where systems have paired the end user with a single worker, limiting the benefits of the crowd. In this paper, we introduce Chorus:View, a system that assists users over the course of longer interactions by engaging workers in a continuous conversation with the user about a video stream from the user's mobile device. We demonstrate the benefit of using multiple crowd workers instead of just one in terms of both latency and accuracy, then conduct a study with 10 blind users that shows Chorus:View answers common visual questions more quickly and accurately than existing approaches. We conclude with a discussion of users' feedback and potential future work on interactive crowd support of blind users.\",\"PeriodicalId\":378932,\"journal\":{\"name\":\"Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"79\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2513383.2517033\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2513383.2517033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Answering visual questions with conversational crowd assistants
Blind people face a range of accessibility challenges in their everyday lives, from reading the text on a package of food to traveling independently in a new place. Answering general questions about one's visual surroundings remains well beyond the capabilities of fully automated systems, but recent systems are showing the potential of engaging on-demand human workers (the crowd) to answer visual questions. The input to such systems has generally been a single image, which can limit the interaction with a worker to one question; or video streams where systems have paired the end user with a single worker, limiting the benefits of the crowd. In this paper, we introduce Chorus:View, a system that assists users over the course of longer interactions by engaging workers in a continuous conversation with the user about a video stream from the user's mobile device. We demonstrate the benefit of using multiple crowd workers instead of just one in terms of both latency and accuracy, then conduct a study with 10 blind users that shows Chorus:View answers common visual questions more quickly and accurately than existing approaches. We conclude with a discussion of users' feedback and potential future work on interactive crowd support of blind users.