{"title":"DeltaFrame BP:一种基于帧差分的深度卷积神经网络视频数据训练与推理算法","authors":"Bing Han;Kaushik Roy","doi":"10.1109/TMSCS.2018.2865303","DOIUrl":null,"url":null,"abstract":"Inspired by the success of deep convolutional neural networks (CNNs) with back-propagation (BP) training on large-scale image recognition tasks, recent research efforts concentrated on expending deep CNNs toward more challenging automatized video analysis, such as video classification, object tracking, action recognition and optical flow detection. Video comprises a sequence of images (frames) captured over time in which image data is a function of space and time. Extracting three-dimensional spatial-temporal features from multiple frames becomes a key ingredient for capturing and incorporating appearance and dynamic representations using deep CNNs. Hence, training deep CNNs on video involves significant computational resources and energy consumption due to extended number of frames across the time line of video length. We propose DeltaFrame-BP, a deep learning algorithm, which significantly reduces computational cost and energy consumption without accuracy degradation by streaming frame differences for deep CNNs training and inference. The inherent similarity between video frames due to high fps (frames per second) in video recording helps achieving high-sparsity and low-dynamic range data streaming using frame differences in comparison with raw video frames. According to our simulation, nearly 25 percent energy reduction was achieved in training using the proposed accuracy-lossless DeltaFrame-BP algorithm in comparison with the standard Back-propagation algorithm.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 4","pages":"624-634"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2865303","citationCount":"6","resultStr":"{\"title\":\"DeltaFrame-BP: An Algorithm Using Frame Difference for Deep Convolutional Neural Networks Training and Inference on Video Data\",\"authors\":\"Bing Han;Kaushik Roy\",\"doi\":\"10.1109/TMSCS.2018.2865303\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Inspired by the success of deep convolutional neural networks (CNNs) with back-propagation (BP) training on large-scale image recognition tasks, recent research efforts concentrated on expending deep CNNs toward more challenging automatized video analysis, such as video classification, object tracking, action recognition and optical flow detection. Video comprises a sequence of images (frames) captured over time in which image data is a function of space and time. Extracting three-dimensional spatial-temporal features from multiple frames becomes a key ingredient for capturing and incorporating appearance and dynamic representations using deep CNNs. Hence, training deep CNNs on video involves significant computational resources and energy consumption due to extended number of frames across the time line of video length. We propose DeltaFrame-BP, a deep learning algorithm, which significantly reduces computational cost and energy consumption without accuracy degradation by streaming frame differences for deep CNNs training and inference. The inherent similarity between video frames due to high fps (frames per second) in video recording helps achieving high-sparsity and low-dynamic range data streaming using frame differences in comparison with raw video frames. According to our simulation, nearly 25 percent energy reduction was achieved in training using the proposed accuracy-lossless DeltaFrame-BP algorithm in comparison with the standard Back-propagation algorithm.\",\"PeriodicalId\":100643,\"journal\":{\"name\":\"IEEE Transactions on Multi-Scale Computing Systems\",\"volume\":\"4 4\",\"pages\":\"624-634\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2865303\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multi-Scale Computing Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/8434331/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multi-Scale Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/8434331/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
DeltaFrame-BP: An Algorithm Using Frame Difference for Deep Convolutional Neural Networks Training and Inference on Video Data
Inspired by the success of deep convolutional neural networks (CNNs) with back-propagation (BP) training on large-scale image recognition tasks, recent research efforts concentrated on expending deep CNNs toward more challenging automatized video analysis, such as video classification, object tracking, action recognition and optical flow detection. Video comprises a sequence of images (frames) captured over time in which image data is a function of space and time. Extracting three-dimensional spatial-temporal features from multiple frames becomes a key ingredient for capturing and incorporating appearance and dynamic representations using deep CNNs. Hence, training deep CNNs on video involves significant computational resources and energy consumption due to extended number of frames across the time line of video length. We propose DeltaFrame-BP, a deep learning algorithm, which significantly reduces computational cost and energy consumption without accuracy degradation by streaming frame differences for deep CNNs training and inference. The inherent similarity between video frames due to high fps (frames per second) in video recording helps achieving high-sparsity and low-dynamic range data streaming using frame differences in comparison with raw video frames. According to our simulation, nearly 25 percent energy reduction was achieved in training using the proposed accuracy-lossless DeltaFrame-BP algorithm in comparison with the standard Back-propagation algorithm.