{"title":"DeltaFrame-BP: An Algorithm Using Frame Difference for Deep Convolutional Neural Networks Training and Inference on Video Data","authors":"Bing Han;Kaushik Roy","doi":"10.1109/TMSCS.2018.2865303","DOIUrl":null,"url":null,"abstract":"Inspired by the success of deep convolutional neural networks (CNNs) with back-propagation (BP) training on large-scale image recognition tasks, recent research efforts concentrated on expending deep CNNs toward more challenging automatized video analysis, such as video classification, object tracking, action recognition and optical flow detection. Video comprises a sequence of images (frames) captured over time in which image data is a function of space and time. Extracting three-dimensional spatial-temporal features from multiple frames becomes a key ingredient for capturing and incorporating appearance and dynamic representations using deep CNNs. Hence, training deep CNNs on video involves significant computational resources and energy consumption due to extended number of frames across the time line of video length. We propose DeltaFrame-BP, a deep learning algorithm, which significantly reduces computational cost and energy consumption without accuracy degradation by streaming frame differences for deep CNNs training and inference. The inherent similarity between video frames due to high fps (frames per second) in video recording helps achieving high-sparsity and low-dynamic range data streaming using frame differences in comparison with raw video frames. According to our simulation, nearly 25 percent energy reduction was achieved in training using the proposed accuracy-lossless DeltaFrame-BP algorithm in comparison with the standard Back-propagation algorithm.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 4","pages":"624-634"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2865303","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multi-Scale Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/8434331/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Inspired by the success of deep convolutional neural networks (CNNs) with back-propagation (BP) training on large-scale image recognition tasks, recent research efforts concentrated on expending deep CNNs toward more challenging automatized video analysis, such as video classification, object tracking, action recognition and optical flow detection. Video comprises a sequence of images (frames) captured over time in which image data is a function of space and time. Extracting three-dimensional spatial-temporal features from multiple frames becomes a key ingredient for capturing and incorporating appearance and dynamic representations using deep CNNs. Hence, training deep CNNs on video involves significant computational resources and energy consumption due to extended number of frames across the time line of video length. We propose DeltaFrame-BP, a deep learning algorithm, which significantly reduces computational cost and energy consumption without accuracy degradation by streaming frame differences for deep CNNs training and inference. The inherent similarity between video frames due to high fps (frames per second) in video recording helps achieving high-sparsity and low-dynamic range data streaming using frame differences in comparison with raw video frames. According to our simulation, nearly 25 percent energy reduction was achieved in training using the proposed accuracy-lossless DeltaFrame-BP algorithm in comparison with the standard Back-propagation algorithm.