基于gpu的CNN框架性能分析

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2017-04-24 DOI:10.1109/ISPASS.2017.7975270

Heehoon Kim, Hyoungwook Nam, Wookeun Jung, Jaejin Lee

{"title":"基于gpu的CNN框架性能分析","authors":"Heehoon Kim, Hyoungwook Nam, Wookeun Jung, Jaejin Lee","doi":"10.1109/ISPASS.2017.7975270","DOIUrl":null,"url":null,"abstract":"Thanks to modern deep learning frameworks that exploit GPUs, convolutional neural networks (CNNs) have been greatly successful in visual recognition tasks. In this paper, we analyze the GPU performance characteristics of five popular deep learning frameworks: Caffe, CNTK, TensorFlow, Theano, and Torch in the perspective of a representative CNN model, AlexNet. Based on the characteristics obtained, we suggest possible optimization methods to increase the efficiency of CNN models built by the frameworks. We also show the GPU performance characteristics of different convolution algorithms each of which uses one of GEMM, direct convolution, FFT, and the Winograd method. We also suggest criteria to choose convolution algorithms for GPUs and methods to build efficient CNN models on GPUs. Since scaling DNNs in a multi-GPU context becomes increasingly important, we also analyze the scalability of the CNN models built by the deep learning frameworks in the multi-GPU context and their overhead. The result indicates that we can increase the speed of training the AlexNet model up to 2X by just changing options provided by the frameworks.","PeriodicalId":123307,"journal":{"name":"2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"78","resultStr":"{\"title\":\"Performance analysis of CNN frameworks for GPUs\",\"authors\":\"Heehoon Kim, Hyoungwook Nam, Wookeun Jung, Jaejin Lee\",\"doi\":\"10.1109/ISPASS.2017.7975270\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Thanks to modern deep learning frameworks that exploit GPUs, convolutional neural networks (CNNs) have been greatly successful in visual recognition tasks. In this paper, we analyze the GPU performance characteristics of five popular deep learning frameworks: Caffe, CNTK, TensorFlow, Theano, and Torch in the perspective of a representative CNN model, AlexNet. Based on the characteristics obtained, we suggest possible optimization methods to increase the efficiency of CNN models built by the frameworks. We also show the GPU performance characteristics of different convolution algorithms each of which uses one of GEMM, direct convolution, FFT, and the Winograd method. We also suggest criteria to choose convolution algorithms for GPUs and methods to build efficient CNN models on GPUs. Since scaling DNNs in a multi-GPU context becomes increasingly important, we also analyze the scalability of the CNN models built by the deep learning frameworks in the multi-GPU context and their overhead. The result indicates that we can increase the speed of training the AlexNet model up to 2X by just changing options provided by the frameworks.\",\"PeriodicalId\":123307,\"journal\":{\"name\":\"2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)\",\"volume\":\"63 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-04-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"78\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISPASS.2017.7975270\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPASS.2017.7975270","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 78

摘要

得益于利用gpu的现代深度学习框架，卷积神经网络(cnn)在视觉识别任务中取得了巨大成功。在本文中，我们从具有代表性的CNN模型AlexNet的角度分析了五种流行的深度学习框架:Caffe、CNTK、TensorFlow、Theano和Torch的GPU性能特征。基于所获得的特征，我们提出了可能的优化方法来提高由框架构建的CNN模型的效率。我们还展示了不同卷积算法的GPU性能特征，每种算法都使用GEMM、直接卷积、FFT和Winograd方法中的一种。我们还提出了选择gpu卷积算法的标准以及在gpu上构建高效CNN模型的方法。由于在多gpu环境下扩展dnn变得越来越重要，我们还分析了由深度学习框架在多gpu环境下构建的CNN模型的可扩展性及其开销。结果表明，我们可以通过改变框架提供的选项将AlexNet模型的训练速度提高到2倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Performance analysis of CNN frameworks for GPUs

Thanks to modern deep learning frameworks that exploit GPUs, convolutional neural networks (CNNs) have been greatly successful in visual recognition tasks. In this paper, we analyze the GPU performance characteristics of five popular deep learning frameworks: Caffe, CNTK, TensorFlow, Theano, and Torch in the perspective of a representative CNN model, AlexNet. Based on the characteristics obtained, we suggest possible optimization methods to increase the efficiency of CNN models built by the frameworks. We also show the GPU performance characteristics of different convolution algorithms each of which uses one of GEMM, direct convolution, FFT, and the Winograd method. We also suggest criteria to choose convolution algorithms for GPUs and methods to build efficient CNN models on GPUs. Since scaling DNNs in a multi-GPU context becomes increasingly important, we also analyze the scalability of the CNN models built by the deep learning frameworks in the multi-GPU context and their overhead. The result indicates that we can increase the speed of training the AlexNet model up to 2X by just changing options provided by the frameworks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

自引率

0.00%

发文量