运动=视频-内容:从视频中实现运动表示的无监督学习

ACM Multimedia Asia Pub Date : 2021-12-01 DOI:10.1145/3469877.3490582

Hehe Fan, Mohan S. Kankanhalli

{"title":"运动=视频-内容:从视频中实现运动表示的无监督学习","authors":"Hehe Fan, Mohan S. Kankanhalli","doi":"10.1145/3469877.3490582","DOIUrl":null,"url":null,"abstract":"Motion, according to its definition in physics, is the change in position with respect to time, regardless of the specific moving object and background. In this paper, we aim to learn appearance-independent motion representation in an unsupervised manner. The main idea is to separate motion from videos while leaving objects and background as content. Specifically, we design an encoder-decoder model which consists of a content encoder, a motion encoder and a video generator. To train the model, we leverage a one-step cycle-consistency in reconstruction within the same video and a two-step cycle-consistency in generation across different videos as self-supervised signals, and use adversarial training to remove the content representation from the motion representation. We demonstrate that the proposed framework can be used for conditional video generation and fine-grained action recognition.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Motion = Video - Content: Towards Unsupervised Learning of Motion Representation from Videos\",\"authors\":\"Hehe Fan, Mohan S. Kankanhalli\",\"doi\":\"10.1145/3469877.3490582\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Motion, according to its definition in physics, is the change in position with respect to time, regardless of the specific moving object and background. In this paper, we aim to learn appearance-independent motion representation in an unsupervised manner. The main idea is to separate motion from videos while leaving objects and background as content. Specifically, we design an encoder-decoder model which consists of a content encoder, a motion encoder and a video generator. To train the model, we leverage a one-step cycle-consistency in reconstruction within the same video and a two-step cycle-consistency in generation across different videos as self-supervised signals, and use adversarial training to remove the content representation from the motion representation. We demonstrate that the proposed framework can be used for conditional video generation and fine-grained action recognition.\",\"PeriodicalId\":210974,\"journal\":{\"name\":\"ACM Multimedia Asia\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Multimedia Asia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3469877.3490582\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Multimedia Asia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3469877.3490582","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

根据物理学的定义，运动是位置相对于时间的变化，与特定的运动物体和背景无关。在本文中，我们的目标是以一种无监督的方式学习与外观无关的运动表示。其主要思想是将动作从视频中分离出来，同时将对象和背景作为内容。具体来说，我们设计了一个由内容编码器、运动编码器和视频生成器组成的编码器-解码器模型。为了训练模型，我们利用同一视频内重建的一步循环一致性和跨不同视频生成的两步循环一致性作为自监督信号，并使用对抗性训练从运动表示中去除内容表示。我们证明了该框架可以用于条件视频生成和细粒度动作识别。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Motion = Video - Content: Towards Unsupervised Learning of Motion Representation from Videos

Motion, according to its definition in physics, is the change in position with respect to time, regardless of the specific moving object and background. In this paper, we aim to learn appearance-independent motion representation in an unsupervised manner. The main idea is to separate motion from videos while leaving objects and background as content. Specifically, we design an encoder-decoder model which consists of a content encoder, a motion encoder and a video generator. To train the model, we leverage a one-step cycle-consistency in reconstruction within the same video and a two-step cycle-consistency in generation across different videos as self-supervised signals, and use adversarial training to remove the content representation from the motion representation. We demonstrate that the proposed framework can be used for conditional video generation and fine-grained action recognition.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Multimedia Asia

自引率

0.00%

发文量