Multi-Task Edge Prediction in Temporally-Dynamic Video Graphs

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-12-06 DOI:10.48550/arXiv.2212.02875

Osman Ulger, Julian Wiederer, Mohsen Ghafoorian, Vasileios Belagiannis, P. Mettes

{"title":"Multi-Task Edge Prediction in Temporally-Dynamic Video Graphs","authors":"Osman Ulger, Julian Wiederer, Mohsen Ghafoorian, Vasileios Belagiannis, P. Mettes","doi":"10.48550/arXiv.2212.02875","DOIUrl":null,"url":null,"abstract":"Graph neural networks have shown to learn effective node representations, enabling node-, link-, and graph-level inference. Conventional graph networks assume static relations between nodes, while relations between entities in a video often evolve over time, with nodes entering and exiting dynamically. In such temporally-dynamic graphs, a core problem is inferring the future state of spatio-temporal edges, which can constitute multiple types of relations. To address this problem, we propose MTD-GNN, a graph network for predicting temporally-dynamic edges for multiple types of relations. We propose a factorized spatio-temporal graph attention layer to learn dynamic node representations and present a multi-task edge prediction loss that models multiple relations simultaneously. The proposed architecture operates on top of scene graphs that we obtain from videos through object detection and spatio-temporal linking. Experimental evaluations on ActionGenome and CLEVRER show that modeling multiple relations in our temporally-dynamic graph network can be mutually beneficial, outperforming existing static and spatio-temporal graph neural networks, as well as state-of-the-art predicate classification methods.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"26 1","pages":"968"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2212.02875","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Graph neural networks have shown to learn effective node representations, enabling node-, link-, and graph-level inference. Conventional graph networks assume static relations between nodes, while relations between entities in a video often evolve over time, with nodes entering and exiting dynamically. In such temporally-dynamic graphs, a core problem is inferring the future state of spatio-temporal edges, which can constitute multiple types of relations. To address this problem, we propose MTD-GNN, a graph network for predicting temporally-dynamic edges for multiple types of relations. We propose a factorized spatio-temporal graph attention layer to learn dynamic node representations and present a multi-task edge prediction loss that models multiple relations simultaneously. The proposed architecture operates on top of scene graphs that we obtain from videos through object detection and spatio-temporal linking. Experimental evaluations on ActionGenome and CLEVRER show that modeling multiple relations in our temporally-dynamic graph network can be mutually beneficial, outperforming existing static and spatio-temporal graph neural networks, as well as state-of-the-art predicate classification methods.

查看原文本刊更多论文

时间动态视频图的多任务边缘预测

图神经网络已经被证明可以学习有效的节点表示，支持节点级、链接级和图级推理。传统的图网络假设节点之间的关系是静态的，而视频中实体之间的关系往往会随着时间的推移而变化，节点的进出是动态的。在这种时间动态图中，一个核心问题是推断时空边缘的未来状态，这可能构成多种类型的关系。为了解决这个问题，我们提出了MTD-GNN，一种用于预测多种类型关系的时间动态边的图网络。我们提出了一个分解的时空图注意层来学习动态节点表示，并提出了一个多任务边缘预测损失，同时建模多个关系。所提出的架构是在我们通过对象检测和时空链接从视频中获得的场景图之上运行的。对ActionGenome和CLEVRER的实验评估表明，在我们的时间动态图网络中建模多个关系可以是互利的，优于现有的静态和时空图神经网络，以及最先进的谓词分类方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference

自引率

0.00%

发文量