Accelerating Deep Learning Tasks with Optimized GPU-assisted Image Decoding

Lipeng Wang, Qiong Luo, Shengen Yan
{"title":"Accelerating Deep Learning Tasks with Optimized GPU-assisted Image Decoding","authors":"Lipeng Wang, Qiong Luo, Shengen Yan","doi":"10.1109/ICPADS51040.2020.00045","DOIUrl":null,"url":null,"abstract":"In computer vision deep learning (DL) tasks, most of the input image datasets are stored in the JPEG format. These JPEG datasets need to be decoded before DL tasks are performed on them. We observe two problems in the current JPEG decoding procedures for DL tasks: (1) the decoding of image entropy data in the decoder is performed sequentially, and this sequential decoding repeats with the DL iterations, which takes significant time; (2) Current parallel decoding methods under-utilize the massive hardware threads on GPUs. To reduce the image decoding time, we introduce a pre-scan mechanism to avoid the repeated image scanning in DL tasks. Our pre-scan generates boundary markers for entropy data so that the decoding can be performed in parallel. To cooperate with the existing dataset storage and caching systems, we propose two modes of the pre-scan mechanism: a compatible mode and a fast mode. The compatible mode does not change the image file structure so pre-scanned files can be stored back to disk for subsequent DL tasks. In comparison, the fast mode crafts a JPEG image into a binary format suitable for parallel decoding, which can be processed directly on the GPU. Since the GPU has thousands of hardware threads, we propose a fine-grained parallel decoding method on the pre-scanned dataset. The fine-grained parallelism utilizes the GPU effectively, and achieves speedups of around 1.5× over existing GPU-assisted image decoding libraries on real-world DL tasks.","PeriodicalId":196548,"journal":{"name":"2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPADS51040.2020.00045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

In computer vision deep learning (DL) tasks, most of the input image datasets are stored in the JPEG format. These JPEG datasets need to be decoded before DL tasks are performed on them. We observe two problems in the current JPEG decoding procedures for DL tasks: (1) the decoding of image entropy data in the decoder is performed sequentially, and this sequential decoding repeats with the DL iterations, which takes significant time; (2) Current parallel decoding methods under-utilize the massive hardware threads on GPUs. To reduce the image decoding time, we introduce a pre-scan mechanism to avoid the repeated image scanning in DL tasks. Our pre-scan generates boundary markers for entropy data so that the decoding can be performed in parallel. To cooperate with the existing dataset storage and caching systems, we propose two modes of the pre-scan mechanism: a compatible mode and a fast mode. The compatible mode does not change the image file structure so pre-scanned files can be stored back to disk for subsequent DL tasks. In comparison, the fast mode crafts a JPEG image into a binary format suitable for parallel decoding, which can be processed directly on the GPU. Since the GPU has thousands of hardware threads, we propose a fine-grained parallel decoding method on the pre-scanned dataset. The fine-grained parallelism utilizes the GPU effectively, and achieves speedups of around 1.5× over existing GPU-assisted image decoding libraries on real-world DL tasks.
利用优化的gpu辅助图像解码加速深度学习任务
在计算机视觉深度学习(DL)任务中,大多数输入图像数据集都以JPEG格式存储。在对这些JPEG数据集执行DL任务之前,需要对它们进行解码。我们观察到目前用于深度学习任务的JPEG解码过程中存在两个问题:(1)解码器中图像熵数据的解码是顺序进行的,并且这种顺序解码随着深度学习迭代而重复,这需要大量的时间;(2)当前的并行解码方法没有充分利用gpu上的大量硬件线程。为了减少图像解码时间,我们引入了一种预扫描机制来避免DL任务中的重复图像扫描。我们的预扫描为熵数据生成边界标记,以便可以并行执行解码。为了配合现有的数据集存储和缓存系统,我们提出了两种预扫描模式:兼容模式和快速模式。兼容模式不会改变映像文件结构,因此可以将预扫描的文件存储回磁盘,用于后续的DL任务。相比之下,快速模式将JPEG图像转换成适合并行解码的二进制格式,可以直接在GPU上处理。由于GPU有数千个硬件线程,我们提出了一种对预扫描数据集进行细粒度并行解码的方法。细粒度并行性有效地利用了GPU,在现实世界的DL任务中,比现有的GPU辅助图像解码库实现了大约1.5倍的加速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信