Graphcore IPU-M2000和Nvidia A100上的时间序列ml回归

2022 IEEE/ACM International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) Pub Date : 2022-11-01 DOI:10.1109/PMBS56514.2022.00019

Jan Balewski, Z. Liu, A. Tsyplikhin, Manuel Lopez Roland, Kristofer E Bouchard

{"title":"Graphcore IPU-M2000和Nvidia A100上的时间序列ml回归","authors":"Jan Balewski, Z. Liu, A. Tsyplikhin, Manuel Lopez Roland, Kristofer E Bouchard","doi":"10.1109/PMBS56514.2022.00019","DOIUrl":null,"url":null,"abstract":"We compare the ML-training performance of a Graphcore IPU-M2000-based system with Nvidia A100 GPU-based system on the Perlmutter HPC machine at NERSC/LBL. The multivariate regression of time series data from a simulated biological neuron was the scientific benchmark problem. The ML-model consisted of several convolutional, batch normalization, and fully connected layers. The training data were distributed in CPUs memory to eliminate the system dependent IO cost. The data-parallel training runs resulted in the same samples throughput on both GC200 IPUs and A100 GPUs for any choice of the number of accelerators between 1 and 256. The achieved best MSE validation loss on IPUs was only 10% to 20% larger. The aggregated energy use per 1 training epoch was between 2.5 to 3 times smaller for the Graphcore system in comparison to the Nvidia system. This paper also discusses aspects of software-hardware co-design to achieve highest efficiency on the IPU using PopTorch.","PeriodicalId":321991,"journal":{"name":"2022 IEEE/ACM International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)","volume":"28 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Time-series ML-regression on Graphcore IPU-M2000 and Nvidia A100\",\"authors\":\"Jan Balewski, Z. Liu, A. Tsyplikhin, Manuel Lopez Roland, Kristofer E Bouchard\",\"doi\":\"10.1109/PMBS56514.2022.00019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We compare the ML-training performance of a Graphcore IPU-M2000-based system with Nvidia A100 GPU-based system on the Perlmutter HPC machine at NERSC/LBL. The multivariate regression of time series data from a simulated biological neuron was the scientific benchmark problem. The ML-model consisted of several convolutional, batch normalization, and fully connected layers. The training data were distributed in CPUs memory to eliminate the system dependent IO cost. The data-parallel training runs resulted in the same samples throughput on both GC200 IPUs and A100 GPUs for any choice of the number of accelerators between 1 and 256. The achieved best MSE validation loss on IPUs was only 10% to 20% larger. The aggregated energy use per 1 training epoch was between 2.5 to 3 times smaller for the Graphcore system in comparison to the Nvidia system. This paper also discusses aspects of software-hardware co-design to achieve highest efficiency on the IPU using PopTorch.\",\"PeriodicalId\":321991,\"journal\":{\"name\":\"2022 IEEE/ACM International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)\",\"volume\":\"28 2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE/ACM International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PMBS56514.2022.00019\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PMBS56514.2022.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

我们在NERSC/LBL的Perlmutter HPC机上比较了基于Graphcore ipu - m2000的系统和基于Nvidia A100 gpu的系统的ml训练性能。模拟生物神经元时间序列数据的多元回归是一个科学基准问题。ml模型由几个卷积层、批处理归一化层和完全连接层组成。训练数据分布在cpu内存中，消除了系统依赖的IO开销。数据并行训练运行在GC200 ipu和A100 gpu上产生相同的样本吞吐量，对于1到256之间的任何加速器数量的选择。在ipu上获得的最佳MSE验证损失仅大10%至20%。与Nvidia系统相比，Graphcore系统每1个训练周期的总能耗要小2.5到3倍。本文还讨论了利用PopTorch实现IPU最高效率的软硬件协同设计方面的问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Time-series ML-regression on Graphcore IPU-M2000 and Nvidia A100

We compare the ML-training performance of a Graphcore IPU-M2000-based system with Nvidia A100 GPU-based system on the Perlmutter HPC machine at NERSC/LBL. The multivariate regression of time series data from a simulated biological neuron was the scientific benchmark problem. The ML-model consisted of several convolutional, batch normalization, and fully connected layers. The training data were distributed in CPUs memory to eliminate the system dependent IO cost. The data-parallel training runs resulted in the same samples throughput on both GC200 IPUs and A100 GPUs for any choice of the number of accelerators between 1 and 256. The achieved best MSE validation loss on IPUs was only 10% to 20% larger. The aggregated energy use per 1 training epoch was between 2.5 to 3 times smaller for the Graphcore system in comparison to the Nvidia system. This paper also discusses aspects of software-hardware co-design to achieve highest efficiency on the IPU using PopTorch.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE/ACM International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)

自引率

0.00%

发文量