{"title":"基于GPU的稀疏深度神经网络性能优化","authors":"Yucheng Shi, Long Ren","doi":"10.1145/3546000.3546011","DOIUrl":null,"url":null,"abstract":"Deep neural networks are widely used in various fields. However, due to the large scale of the latest deep neural networks, the research on the sparsity of deep neural networks is constantly carried out. The implementation of the sparse deep neural network on GPU can further accelerate the computing speed of a sparse deep neural network. The performance of the GPU code of the CUDA version is far superior to the CPU codes of the Matlab version, which confirms the great superiority of the sparse deep neural network implementation on GPU. And the GPU code of the CUDA version is up x1.61 faster than the CUSPARSE version when the deep neural network has 1024 neurons and the 1920 layers.","PeriodicalId":196955,"journal":{"name":"Proceedings of the 6th International Conference on High Performance Compilation, Computing and Communications","volume":"111 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance Optimization of Sparse Deep Neural Networks Based on GPU\",\"authors\":\"Yucheng Shi, Long Ren\",\"doi\":\"10.1145/3546000.3546011\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks are widely used in various fields. However, due to the large scale of the latest deep neural networks, the research on the sparsity of deep neural networks is constantly carried out. The implementation of the sparse deep neural network on GPU can further accelerate the computing speed of a sparse deep neural network. The performance of the GPU code of the CUDA version is far superior to the CPU codes of the Matlab version, which confirms the great superiority of the sparse deep neural network implementation on GPU. And the GPU code of the CUDA version is up x1.61 faster than the CUSPARSE version when the deep neural network has 1024 neurons and the 1920 layers.\",\"PeriodicalId\":196955,\"journal\":{\"name\":\"Proceedings of the 6th International Conference on High Performance Compilation, Computing and Communications\",\"volume\":\"111 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 6th International Conference on High Performance Compilation, Computing and Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3546000.3546011\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Conference on High Performance Compilation, Computing and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3546000.3546011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Performance Optimization of Sparse Deep Neural Networks Based on GPU
Deep neural networks are widely used in various fields. However, due to the large scale of the latest deep neural networks, the research on the sparsity of deep neural networks is constantly carried out. The implementation of the sparse deep neural network on GPU can further accelerate the computing speed of a sparse deep neural network. The performance of the GPU code of the CUDA version is far superior to the CPU codes of the Matlab version, which confirms the great superiority of the sparse deep neural network implementation on GPU. And the GPU code of the CUDA version is up x1.61 faster than the CUSPARSE version when the deep neural network has 1024 neurons and the 1920 layers.