What Will I Do Next? The Intention from Motion Experiment

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2017-07-21 DOI:10.1109/CVPRW.2017.7

Andrea Zunino, Jacopo Cavazza, A. Koul, A. Cavallo, C. Becchio, Vittorio Murino

{"title":"What Will I Do Next? The Intention from Motion Experiment","authors":"Andrea Zunino, Jacopo Cavazza, A. Koul, A. Cavallo, C. Becchio, Vittorio Murino","doi":"10.1109/CVPRW.2017.7","DOIUrl":null,"url":null,"abstract":"In computer vision, video-based approaches have been widely explored for the early classification and the prediction of actions or activities. However, it remains unclear whether this modality (as compared to 3D kinematics) can still be reliable for the prediction of human intentions, defined as the overarching goal embedded in an action sequence. Since the same action can be performed with different intentions, this problem is more challenging but yet affordable as proved by quantitative cognitive studies which exploit the 3D kinematics acquired through motion capture systems.In this paper, we bridge cognitive and computer vision studies, by demonstrating the effectiveness of video-based approaches for the prediction of human intentions. Precisely, we propose Intention from Motion, a new paradigm where, without using any contextual information, we consider instantaneous grasping motor acts involving a bottle in order to forecast why the bottle itself has been reached (to pass it or to place in a box, or to pour or to drink the liquid inside).We process only the grasping onsets casting intention prediction as a classification framework. Leveraging on our multimodal acquisition (3D motion capture data and 2D optical videos), we compare the most commonly used 3D descriptors from cognitive studies with state-of-the-art video-based techniques. Since the two analyses achieve an equivalent performance, we demonstrate that computer vision tools are effective in capturing the kinematics and facing the cognitive problem of human intention prediction.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"19 1","pages":"1-8"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPRW.2017.7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

In computer vision, video-based approaches have been widely explored for the early classification and the prediction of actions or activities. However, it remains unclear whether this modality (as compared to 3D kinematics) can still be reliable for the prediction of human intentions, defined as the overarching goal embedded in an action sequence. Since the same action can be performed with different intentions, this problem is more challenging but yet affordable as proved by quantitative cognitive studies which exploit the 3D kinematics acquired through motion capture systems.In this paper, we bridge cognitive and computer vision studies, by demonstrating the effectiveness of video-based approaches for the prediction of human intentions. Precisely, we propose Intention from Motion, a new paradigm where, without using any contextual information, we consider instantaneous grasping motor acts involving a bottle in order to forecast why the bottle itself has been reached (to pass it or to place in a box, or to pour or to drink the liquid inside).We process only the grasping onsets casting intention prediction as a classification framework. Leveraging on our multimodal acquisition (3D motion capture data and 2D optical videos), we compare the most commonly used 3D descriptors from cognitive studies with state-of-the-art video-based techniques. Since the two analyses achieve an equivalent performance, we demonstrate that computer vision tools are effective in capturing the kinematics and facing the cognitive problem of human intention prediction.

查看原文本刊更多论文

接下来我该做什么?动作意图实验

在计算机视觉中，基于视频的方法已被广泛用于对动作或活动的早期分类和预测。然而，目前尚不清楚这种模式(与3D运动学相比)是否仍然可以可靠地预测人类意图，定义为嵌入在动作序列中的总体目标。由于相同的动作可以用不同的意图来执行，所以这个问题更具挑战性，但正如通过动作捕捉系统获得的3D运动学的定量认知研究所证明的那样。在本文中，我们通过展示基于视频的方法预测人类意图的有效性，将认知和计算机视觉研究联系起来。准确地说，我们提出了“来自运动的意图”，这是一种新的范式，在不使用任何上下文信息的情况下，我们考虑了涉及瓶子的瞬时抓取运动行为，以预测瓶子本身被拿到的原因(传递它或放在盒子里，或倒或喝里面的液体)。我们只处理抓取事件，将意图预测作为分类框架。利用我们的多模态采集(3D动作捕捉数据和2D光学视频)，我们将认知研究中最常用的3D描述符与最先进的基于视频的技术进行了比较。由于这两种分析达到了相当的性能，我们证明计算机视觉工具在捕获运动学和面对人类意图预测的认知问题方面是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

自引率

0.00%

发文量