Intra- and Inter-frame Iterative Temporal Convolutional Networks for Video Stabilization

ACM Multimedia Asia Pub Date : 2021-12-01 DOI:10.1145/3469877.3490608

Haopeng Xie, Liang Xiao, Huicong Wu

{"title":"Intra- and Inter-frame Iterative Temporal Convolutional Networks for Video Stabilization","authors":"Haopeng Xie, Liang Xiao, Huicong Wu","doi":"10.1145/3469877.3490608","DOIUrl":null,"url":null,"abstract":"Video jitter is an uncomfortable product of irregular lens motion in time sequence. How to extract motion state information in a period of continuous video frames is a major issue for video stabilization. In this paper, we propose a novel sequence model, Intra- and Inter-frame Iterative Temporal Convolutional Networks (I3TC-Net), which alternatively transfer the spatial-temporal correlation of motion within and between frames. We hypothesize that the motion state information can be represented by transmission states. Specifically, we employ combination of Convolutional Long Short-Term Memory (ConvLSTM) and embedded encoder-decoder to generate the latent stable frame, which are used to update transmission states iteratively and learn a global homography transformation effectively for each unstable frame to generate the corresponding stabilized result along the time axis. Furthermore, we create a video dataset to solve the lack of stable data and improve the training effect. Experimental results show that our method outperforms state-of-the-art results on publicly available videos, such as 5.4 points improvements in stability score. The project page is available at https://github.com/root2022IIITC/IIITC.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"42 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Multimedia Asia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3469877.3490608","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Video jitter is an uncomfortable product of irregular lens motion in time sequence. How to extract motion state information in a period of continuous video frames is a major issue for video stabilization. In this paper, we propose a novel sequence model, Intra- and Inter-frame Iterative Temporal Convolutional Networks (I3TC-Net), which alternatively transfer the spatial-temporal correlation of motion within and between frames. We hypothesize that the motion state information can be represented by transmission states. Specifically, we employ combination of Convolutional Long Short-Term Memory (ConvLSTM) and embedded encoder-decoder to generate the latent stable frame, which are used to update transmission states iteratively and learn a global homography transformation effectively for each unstable frame to generate the corresponding stabilized result along the time axis. Furthermore, we create a video dataset to solve the lack of stable data and improve the training effect. Experimental results show that our method outperforms state-of-the-art results on publicly available videos, such as 5.4 points improvements in stability score. The project page is available at https://github.com/root2022IIITC/IIITC.

查看原文本刊更多论文

用于视频稳定的帧内和帧间迭代时间卷积网络

视频抖动是镜头在时间序列上不规则运动的一种令人不适的产物。如何在一段连续视频帧中提取运动状态信息是视频防抖的主要问题。在本文中，我们提出了一种新的序列模型，帧内和帧间迭代时间卷积网络(I3TC-Net)，它交替地传递帧内和帧之间运动的时空相关性。我们假设运动状态信息可以用传输状态来表示。具体来说，我们采用卷积长短期记忆(ConvLSTM)和嵌入式编码器-解码器相结合的方法来生成潜在稳定帧，该帧用于迭代更新传输状态，并有效地学习每个不稳定帧的全局单应变换，从而沿时间轴产生相应的稳定结果。在此基础上，我们创建了视频数据集，解决了稳定数据的不足，提高了训练效果。实验结果表明，我们的方法在公开可用的视频上优于最先进的结果，例如稳定性得分提高了5.4分。项目页面可在https://github.com/root2022IIITC/IIITC上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Multimedia Asia

自引率

0.00%

发文量