Towards a Multimodal and Context-Aware Framework for Human Navigational Intent Inference

Proceedings of the 2020 International Conference on Multimodal Interaction Pub Date : 2020-10-21 DOI:10.1145/3382507.3421156

Z. Zhang

{"title":"Towards a Multimodal and Context-Aware Framework for Human Navigational Intent Inference","authors":"Z. Zhang","doi":"10.1145/3382507.3421156","DOIUrl":null,"url":null,"abstract":"A socially acceptable robot needs to make correct decisions and be able to understand human intent in order to interact with and navigate around humans safely. Although research in computer vision and robotics has made huge advance in recent years, today's robotics systems still need better understanding of human intent to be more effective and widely accepted. Currently such inference is typically done using only one mode of perception such as vision, or human movement trajectory. In this extended abstract, I describe my PhD research plan of developing a novel multimodal and context-aware framework, in which a robot infers human navigational intentions through multimodal perception comprised of human temporal facial, body pose and gaze features, human motion feature as well as environmental context. To facility this framework, a data collection experiment is designed to acquire multimodal human-robot interaction data. Our initial design of the framework is based on a temporal neural network model with human motion, body pose and head orientation features as input. And we will increase the complexity of the neural network model as well as the input features along the way. In the long term, this framework can benefit a variety of settings such as autonomous driving, service and household robots.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3382507.3421156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

A socially acceptable robot needs to make correct decisions and be able to understand human intent in order to interact with and navigate around humans safely. Although research in computer vision and robotics has made huge advance in recent years, today's robotics systems still need better understanding of human intent to be more effective and widely accepted. Currently such inference is typically done using only one mode of perception such as vision, or human movement trajectory. In this extended abstract, I describe my PhD research plan of developing a novel multimodal and context-aware framework, in which a robot infers human navigational intentions through multimodal perception comprised of human temporal facial, body pose and gaze features, human motion feature as well as environmental context. To facility this framework, a data collection experiment is designed to acquire multimodal human-robot interaction data. Our initial design of the framework is based on a temporal neural network model with human motion, body pose and head orientation features as input. And we will increase the complexity of the neural network model as well as the input features along the way. In the long term, this framework can benefit a variety of settings such as autonomous driving, service and household robots.

查看原文本刊更多论文

面向人类导航意图推理的多模态和上下文感知框架

一个被社会接受的机器人需要做出正确的决定，能够理解人类的意图，以便与人类互动并安全地在人类周围导航。尽管计算机视觉和机器人技术的研究近年来取得了巨大的进步，但今天的机器人系统仍然需要更好地理解人类的意图，才能更有效地被广泛接受。目前，这种推断通常只使用一种感知模式，如视觉或人类运动轨迹。在这篇扩展摘要中，我描述了我的博士研究计划，即开发一种新的多模态和上下文感知框架，其中机器人通过由人类时间面部、身体姿势和凝视特征、人类运动特征以及环境背景组成的多模态感知来推断人类导航意图。为了实现这一框架，设计了一个数据收集实验来获取多模态人机交互数据。我们最初的框架设计是基于一个时间神经网络模型，以人体运动、身体姿势和头部方向特征作为输入。我们将增加神经网络模型的复杂性以及输入特征。从长远来看，这个框架可以使自动驾驶、服务和家用机器人等各种环境受益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2020 International Conference on Multimodal Interaction

自引率

0.00%

发文量