具有全局时间一致性的视频到视频翻译

Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI:10.1145/3240508.3240708

Xingxing Wei, Jun Zhu, Sitong Feng, Hang Su

{"title":"具有全局时间一致性的视频到视频翻译","authors":"Xingxing Wei, Jun Zhu, Sitong Feng, Hang Su","doi":"10.1145/3240508.3240708","DOIUrl":null,"url":null,"abstract":"Although image-to-image translation has been widely studied, the video-to-video translation is rarely mentioned. In this paper, we propose an unified video-to-video translation framework to accom- plish different tasks, like video super-resolution, video colouriza- tion, and video segmentation, etc. A consequent question within video-to-video translation lies in the flickering appearance along with the varying frames. To overcome this issue, a usual method is to incorporate the temporal loss between adjacent frames in the optimization, which is a kind of local frame-wise temporal con- sistency. We instead present a residual error based mechanism to ensure the video-level consistency of the same location in different frames (called (lobal temporal consistency). The global and local consistency are simultaneously integrated into our video-to-video framework to achieve more stable videos. Our method is based on the GAN framework, where we present a two-channel discrimina- tor. One channel is to encode the video RGB space, and another is to encode the residual error of the video as a whole to meet the global consistency. Extensive experiments conducted on different video- to-video translation tasks verify the effectiveness and flexibleness of the proposed method.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Video-to-Video Translation with Global Temporal Consistency\",\"authors\":\"Xingxing Wei, Jun Zhu, Sitong Feng, Hang Su\",\"doi\":\"10.1145/3240508.3240708\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Although image-to-image translation has been widely studied, the video-to-video translation is rarely mentioned. In this paper, we propose an unified video-to-video translation framework to accom- plish different tasks, like video super-resolution, video colouriza- tion, and video segmentation, etc. A consequent question within video-to-video translation lies in the flickering appearance along with the varying frames. To overcome this issue, a usual method is to incorporate the temporal loss between adjacent frames in the optimization, which is a kind of local frame-wise temporal con- sistency. We instead present a residual error based mechanism to ensure the video-level consistency of the same location in different frames (called (lobal temporal consistency). The global and local consistency are simultaneously integrated into our video-to-video framework to achieve more stable videos. Our method is based on the GAN framework, where we present a two-channel discrimina- tor. One channel is to encode the video RGB space, and another is to encode the residual error of the video as a whole to meet the global consistency. Extensive experiments conducted on different video- to-video translation tasks verify the effectiveness and flexibleness of the proposed method.\",\"PeriodicalId\":339857,\"journal\":{\"name\":\"Proceedings of the 26th ACM international conference on Multimedia\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 26th ACM international conference on Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3240508.3240708\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th ACM international conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3240508.3240708","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

虽然图像到图像的翻译已经被广泛研究，但视频到视频的翻译却很少被提及。在本文中，我们提出了一个统一的视频到视频翻译框架来完成不同的任务，如视频超分辨率、视频彩色化和视频分割等。在视频到视频的翻译中，随之而来的问题是随着帧的变化而出现的闪烁现象。为了克服这一问题，通常的方法是在优化中考虑相邻帧之间的时间损失，这是一种局部逐帧时间一致性。我们提出了一种基于残差的机制来确保同一位置在不同帧中的视频级一致性(称为全局时间一致性)。同时将全局一致性和局部一致性整合到我们的视频到视频框架中，以实现更稳定的视频。我们的方法是基于GAN框架，其中我们提出了一个双通道鉴别器。一个通道是对视频RGB空间进行编码，另一个通道是将视频的残差作为一个整体进行编码，以满足全局一致性。在不同的视频到视频翻译任务上进行的大量实验验证了该方法的有效性和灵活性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Video-to-Video Translation with Global Temporal Consistency

Although image-to-image translation has been widely studied, the video-to-video translation is rarely mentioned. In this paper, we propose an unified video-to-video translation framework to accom- plish different tasks, like video super-resolution, video colouriza- tion, and video segmentation, etc. A consequent question within video-to-video translation lies in the flickering appearance along with the varying frames. To overcome this issue, a usual method is to incorporate the temporal loss between adjacent frames in the optimization, which is a kind of local frame-wise temporal con- sistency. We instead present a residual error based mechanism to ensure the video-level consistency of the same location in different frames (called (lobal temporal consistency). The global and local consistency are simultaneously integrated into our video-to-video framework to achieve more stable videos. Our method is based on the GAN framework, where we present a two-channel discrimina- tor. One channel is to encode the video RGB space, and another is to encode the residual error of the video as a whole to meet the global consistency. Extensive experiments conducted on different video- to-video translation tasks verify the effectiveness and flexibleness of the proposed method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 26th ACM international conference on Multimedia

自引率

0.00%

发文量