{"title":"AC922架构对深度神经网络训练性能的影响","authors":"P. Rosciszewski, Michał Iwański, P. Czarnul","doi":"10.1109/HPCS48598.2019.9188164","DOIUrl":null,"url":null,"abstract":"Practical deep learning applications require more and more computing power. New computing architectures emerge, specifically designed for the artificial intelligence applications, including the IBM Power System AC922. In this paper we confront an AC922 (8335-GTG) server equipped with 4 NVIDIA Volta V100 GPUs with selected deep neural network training applications, including four convolutional and one recurrent model. We report performance results depending on batch sizes and GPU selection and compare them with the results from another contemporary workstation based on the same set of GPUs – NVIDIA® DGX Station™. The results show that the AC922 performs better in all tested configurations, achieving improvements up to 10.3%. Profiling indicates that the improvement is due to the efficient I/O pipeline. The performance differences depend on the specific model, rather than on the model class (RNN/CNN). Both systems offer good scalability up to 4 GPUs. In certain cases there is a significant difference in performance depending on exactly which GPUs are used for computations.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"61 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"The impact of the AC922 Architecture on Performance of Deep Neural Network Training\",\"authors\":\"P. Rosciszewski, Michał Iwański, P. Czarnul\",\"doi\":\"10.1109/HPCS48598.2019.9188164\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Practical deep learning applications require more and more computing power. New computing architectures emerge, specifically designed for the artificial intelligence applications, including the IBM Power System AC922. In this paper we confront an AC922 (8335-GTG) server equipped with 4 NVIDIA Volta V100 GPUs with selected deep neural network training applications, including four convolutional and one recurrent model. We report performance results depending on batch sizes and GPU selection and compare them with the results from another contemporary workstation based on the same set of GPUs – NVIDIA® DGX Station™. The results show that the AC922 performs better in all tested configurations, achieving improvements up to 10.3%. Profiling indicates that the improvement is due to the efficient I/O pipeline. The performance differences depend on the specific model, rather than on the model class (RNN/CNN). Both systems offer good scalability up to 4 GPUs. In certain cases there is a significant difference in performance depending on exactly which GPUs are used for computations.\",\"PeriodicalId\":371856,\"journal\":{\"name\":\"2019 International Conference on High Performance Computing & Simulation (HPCS)\",\"volume\":\"61 1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on High Performance Computing & Simulation (HPCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCS48598.2019.9188164\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCS48598.2019.9188164","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
摘要
实际的深度学习应用需要越来越多的计算能力。新的计算架构出现了,专门为人工智能应用设计,包括IBM Power System AC922。在本文中,我们面对一个配备4个NVIDIA Volta V100 gpu的AC922 (8335-GTG)服务器,并选择深度神经网络训练应用,包括四个卷积模型和一个循环模型。我们根据批处理大小和GPU选择报告性能结果,并将其与基于同一组GPU的另一个现代工作站(NVIDIA®DGX Station™)的结果进行比较。结果表明,AC922在所有测试配置中都表现较好,实现了10.3%的改进。分析表明,改进是由于高效的I/O管道。性能差异取决于特定的模型,而不是模型类(RNN/CNN)。这两个系统都提供了良好的可扩展性,最多可达4个gpu。在某些情况下,根据使用哪种gpu进行计算,性能会有显著差异。
The impact of the AC922 Architecture on Performance of Deep Neural Network Training
Practical deep learning applications require more and more computing power. New computing architectures emerge, specifically designed for the artificial intelligence applications, including the IBM Power System AC922. In this paper we confront an AC922 (8335-GTG) server equipped with 4 NVIDIA Volta V100 GPUs with selected deep neural network training applications, including four convolutional and one recurrent model. We report performance results depending on batch sizes and GPU selection and compare them with the results from another contemporary workstation based on the same set of GPUs – NVIDIA® DGX Station™. The results show that the AC922 performs better in all tested configurations, achieving improvements up to 10.3%. Profiling indicates that the improvement is due to the efficient I/O pipeline. The performance differences depend on the specific model, rather than on the model class (RNN/CNN). Both systems offer good scalability up to 4 GPUs. In certain cases there is a significant difference in performance depending on exactly which GPUs are used for computations.