Deep Learning Approaches to Predict Future Frames in Videos

Int. J. Recent Contributions Eng. Sci. IT Pub Date : 2022-11-04 DOI:10.3991/ijes.v10i03.33893

T. Islam, Md. Hafizul Imran, Md. Ramim Hossain, Md. Tamjeed Monshi, Himanish Debnath Himu, Md. Ashikur Rahman, Gourob Saha Surjo

{"title":"Deep Learning Approaches to Predict Future Frames in Videos","authors":"T. Islam, Md. Hafizul Imran, Md. Ramim Hossain, Md. Tamjeed Monshi, Himanish Debnath Himu, Md. Ashikur Rahman, Gourob Saha Surjo","doi":"10.3991/ijes.v10i03.33893","DOIUrl":null,"url":null,"abstract":"Deep neural networks are becoming central in several areas of computer vision. While there have been a lot of studies regarding the classification of images and videos, future frame prediction is still a rarely investigated approach, and even some applications could make good use of the knowledge regarding the next frame of an image sequence in pixel-space. Examples include video compression and autonomous agents in robotics that have to act in natural environments. Learning how to forecast the future of an image sequence requires the system to understand and efficiently encode the content and dynamics for a certain period. It is viewed as a promising avenue from which even supervised tasks could benefit since labeled video data is limited and hard to obtain. Therefore, this work gives an overview of scientific advances covering future frame prediction and proposes a recurrent network model which utilizes recent techniques from deep learning research. The presented architecture is based on the recurrent decoder-encoder framework with convolutional cells, which allows the preservation of Spatio-temporal data correlations. Driven by perceptual-motivated objective functions and a modern recurrent learning strategy, it can outperform existing approaches concerning future frame generation in several video content types. All this can be achieved with fewer training iterations and model parameters.","PeriodicalId":427062,"journal":{"name":"Int. J. Recent Contributions Eng. Sci. IT","volume":"261 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Recent Contributions Eng. Sci. IT","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3991/ijes.v10i03.33893","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Deep neural networks are becoming central in several areas of computer vision. While there have been a lot of studies regarding the classification of images and videos, future frame prediction is still a rarely investigated approach, and even some applications could make good use of the knowledge regarding the next frame of an image sequence in pixel-space. Examples include video compression and autonomous agents in robotics that have to act in natural environments. Learning how to forecast the future of an image sequence requires the system to understand and efficiently encode the content and dynamics for a certain period. It is viewed as a promising avenue from which even supervised tasks could benefit since labeled video data is limited and hard to obtain. Therefore, this work gives an overview of scientific advances covering future frame prediction and proposes a recurrent network model which utilizes recent techniques from deep learning research. The presented architecture is based on the recurrent decoder-encoder framework with convolutional cells, which allows the preservation of Spatio-temporal data correlations. Driven by perceptual-motivated objective functions and a modern recurrent learning strategy, it can outperform existing approaches concerning future frame generation in several video content types. All this can be achieved with fewer training iterations and model parameters.

查看原文本刊更多论文

预测视频中未来帧的深度学习方法

深度神经网络正成为计算机视觉几个领域的核心。虽然已经有很多关于图像和视频分类的研究，但未来帧预测仍然是一种很少被研究的方法，甚至一些应用可以很好地利用像素空间中图像序列的下一帧的知识。例子包括视频压缩和机器人中的自主代理，它们必须在自然环境中行动。学习如何预测图像序列的未来需要系统理解并有效地编码特定时期的内容和动态。它被认为是一个有前途的途径，甚至监督任务也可以从中受益，因为标记的视频数据是有限的，很难获得。因此，这项工作概述了涵盖未来框架预测的科学进展，并提出了一个利用深度学习研究最新技术的循环网络模型。所提出的架构是基于循环解码器-编码器框架与卷积单元，它允许保存时空数据的相关性。在感知动机目标函数和现代循环学习策略的驱动下，它可以在几种视频内容类型中优于现有的关于未来帧生成的方法。所有这些都可以通过更少的训练迭代和模型参数来实现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Int. J. Recent Contributions Eng. Sci. IT

自引率

0.00%

发文量