手势增强对模糊人机指令的理解

Proceedings of the 2020 International Conference on Multimodal Interaction Pub Date : 2020-10-21 DOI:10.1145/3382507.3418863

Dulanga Weerakoon, Vigneshwaran Subbaraju, Nipuni Karumpulli, Tuan Tran, Qianli Xu, U-Xuan Tan, Joo-Hwee Lim, Archan Misra

{"title":"手势增强对模糊人机指令的理解","authors":"Dulanga Weerakoon, Vigneshwaran Subbaraju, Nipuni Karumpulli, Tuan Tran, Qianli Xu, U-Xuan Tan, Joo-Hwee Lim, Archan Misra","doi":"10.1145/3382507.3418863","DOIUrl":null,"url":null,"abstract":"This work demonstrates the feasibility and benefits of using pointing gestures, a naturally-generated additional input modality, to improve the multi-modal comprehension accuracy of human instructions to robotic agents for collaborative tasks.We present M2Gestic, a system that combines neural-based text parsing with a novel knowledge-graph traversal mechanism, over a multi-modal input of vision, natural language text and pointing. Via multiple studies related to a benchmark table top manipulation task, we show that (a) M2Gestic can achieve close-to-human performance in reasoning over unambiguous verbal instructions, and (b) incorporating pointing input (even with its inherent location uncertainty) in M2Gestic results in a significant (30%) accuracy improvement when verbal instructions are ambiguous.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Gesture Enhanced Comprehension of Ambiguous Human-to-Robot Instructions\",\"authors\":\"Dulanga Weerakoon, Vigneshwaran Subbaraju, Nipuni Karumpulli, Tuan Tran, Qianli Xu, U-Xuan Tan, Joo-Hwee Lim, Archan Misra\",\"doi\":\"10.1145/3382507.3418863\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This work demonstrates the feasibility and benefits of using pointing gestures, a naturally-generated additional input modality, to improve the multi-modal comprehension accuracy of human instructions to robotic agents for collaborative tasks.We present M2Gestic, a system that combines neural-based text parsing with a novel knowledge-graph traversal mechanism, over a multi-modal input of vision, natural language text and pointing. Via multiple studies related to a benchmark table top manipulation task, we show that (a) M2Gestic can achieve close-to-human performance in reasoning over unambiguous verbal instructions, and (b) incorporating pointing input (even with its inherent location uncertainty) in M2Gestic results in a significant (30%) accuracy improvement when verbal instructions are ambiguous.\",\"PeriodicalId\":402394,\"journal\":{\"name\":\"Proceedings of the 2020 International Conference on Multimodal Interaction\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2020 International Conference on Multimodal Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3382507.3418863\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3382507.3418863","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

这项工作证明了使用指向手势的可行性和好处，这是一种自然产生的额外输入方式，可以提高人类对机器人代理进行协作任务的指令的多模态理解精度。我们提出了M2Gestic系统，它结合了基于神经的文本解析和一种新的知识图遍历机制，通过视觉、自然语言文本和指向的多模态输入。通过与基准桌面操作任务相关的多项研究，我们表明(a) M2Gestic可以在无歧义的口头指令上实现接近人类的推理性能，(b)在M2Gestic中结合指向输入(即使具有固有的位置不确定性)，当口头指令含糊不清时，准确性显著提高(30%)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Gesture Enhanced Comprehension of Ambiguous Human-to-Robot Instructions

This work demonstrates the feasibility and benefits of using pointing gestures, a naturally-generated additional input modality, to improve the multi-modal comprehension accuracy of human instructions to robotic agents for collaborative tasks.We present M2Gestic, a system that combines neural-based text parsing with a novel knowledge-graph traversal mechanism, over a multi-modal input of vision, natural language text and pointing. Via multiple studies related to a benchmark table top manipulation task, we show that (a) M2Gestic can achieve close-to-human performance in reasoning over unambiguous verbal instructions, and (b) incorporating pointing input (even with its inherent location uncertainty) in M2Gestic results in a significant (30%) accuracy improvement when verbal instructions are ambiguous.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2020 International Conference on Multimodal Interaction

自引率

0.00%

发文量