Improving Action Segmentation via Graph-Based Temporal Reasoning

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2020-06-01 DOI:10.1109/cvpr42600.2020.01404

Yifei Huang, Yusuke Sugano, Yoichi Sato

{"title":"Improving Action Segmentation via Graph-Based Temporal Reasoning","authors":"Yifei Huang, Yusuke Sugano, Yoichi Sato","doi":"10.1109/cvpr42600.2020.01404","DOIUrl":null,"url":null,"abstract":"Temporal relations among multiple action segments play an important role in action segmentation especially when observations are limited (e.g., actions are occluded by other objects or happen outside a field of view). In this paper, we propose a network module called Graph-based Temporal Reasoning Module (GTRM) that can be built on top of existing action segmentation models to learn the relation of multiple action segments in various time spans. We model the relations by using two Graph Convolution Networks (GCNs) where each node represents an action segment. The two graphs have different edge properties to account for boundary regression and classification tasks, respectively. By applying graph convolution, we can update each node's representation based on its relation with neighboring nodes. The updated representation is then used for improved action segmentation. We evaluate our model on the challenging egocentric datasets namely EGTEA and EPIC-Kitchens, where actions may be partially observed due to the viewpoint restriction. The results show that our proposed GTRM outperforms state-of-the-art action segmentation models by a large margin. We also demonstrate the effectiveness of our model on two third-person video datasets, the 50Salads dataset and the Breakfast dataset.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"61 1","pages":"14021-14031"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"92","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/cvpr42600.2020.01404","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 92

Abstract

Temporal relations among multiple action segments play an important role in action segmentation especially when observations are limited (e.g., actions are occluded by other objects or happen outside a field of view). In this paper, we propose a network module called Graph-based Temporal Reasoning Module (GTRM) that can be built on top of existing action segmentation models to learn the relation of multiple action segments in various time spans. We model the relations by using two Graph Convolution Networks (GCNs) where each node represents an action segment. The two graphs have different edge properties to account for boundary regression and classification tasks, respectively. By applying graph convolution, we can update each node's representation based on its relation with neighboring nodes. The updated representation is then used for improved action segmentation. We evaluate our model on the challenging egocentric datasets namely EGTEA and EPIC-Kitchens, where actions may be partially observed due to the viewpoint restriction. The results show that our proposed GTRM outperforms state-of-the-art action segmentation models by a large margin. We also demonstrate the effectiveness of our model on two third-person video datasets, the 50Salads dataset and the Breakfast dataset.

查看原文本刊更多论文

基于图的时间推理改进动作分割

多个动作片段之间的时间关系在动作分割中起着重要的作用，特别是当观察有限时(例如，动作被其他物体遮挡或发生在视场之外)。在本文中，我们提出了一个基于图的时态推理模块(GTRM)的网络模块，该模块可以建立在现有的动作分割模型之上，以学习多个动作片段在不同时间跨度内的关系。我们通过使用两个图卷积网络(GCNs)对关系进行建模，其中每个节点表示一个动作段。这两个图具有不同的边缘属性，分别用于边界回归和分类任务。通过图卷积，我们可以根据每个节点与相邻节点的关系来更新每个节点的表示。然后将更新后的表示用于改进的动作分割。我们在具有挑战性的以自我为中心的数据集上评估我们的模型，即EGTEA和EPIC-Kitchens，其中由于视点限制，可能会部分观察到动作。结果表明，我们提出的GTRM在很大程度上优于最先进的动作分割模型。我们还在两个第三人称视频数据集(50salad数据集和Breakfast数据集)上展示了我们的模型的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量