W. Brewer, G. Behm, A. Scheinine, Ben Parsons, Wesley Emeneker, Robert P. Trevino
{"title":"Inference Benchmarking on HPC Systems","authors":"W. Brewer, G. Behm, A. Scheinine, Ben Parsons, Wesley Emeneker, Robert P. Trevino","doi":"10.1109/HPEC43674.2020.9286138","DOIUrl":null,"url":null,"abstract":"As deep learning on edge computing systems has become more prevalent, investigation of architectures and configurations for optimal inference performance has become a critical step for proposed artificial intelligence solutions. While there has been considerable work in the development of hardware and software for high performance inferencing, there is little known about the performance of such systems on HPC architectures. In this paper, we address outstanding questions on the parallel inference performance on HPC systems. We report results and recommendations derived from evaluating iBench on multiple platforms in a variety of HPC configurations. We systematically benchmark single-GPU performance, single-node performance, and multi-node performance for maximum client-side and server-side inference throughput. In order to achieve linear speedup, we show that concurrent sending clients must be used, as opposed to sending large batch payloads parallelized across multiple GPUs. We show that client/server inferencing architectures add a considerable data transfer component that needs to be taken into consideration when benchmarking HPC system that benchmarks such as MLPerf do not measure. Finally, we investigate energy efficiency of GPUs for different levels of concurrency and batch sizes to report optimal configurations that minimize cost per inference.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC43674.2020.9286138","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
As deep learning on edge computing systems has become more prevalent, investigation of architectures and configurations for optimal inference performance has become a critical step for proposed artificial intelligence solutions. While there has been considerable work in the development of hardware and software for high performance inferencing, there is little known about the performance of such systems on HPC architectures. In this paper, we address outstanding questions on the parallel inference performance on HPC systems. We report results and recommendations derived from evaluating iBench on multiple platforms in a variety of HPC configurations. We systematically benchmark single-GPU performance, single-node performance, and multi-node performance for maximum client-side and server-side inference throughput. In order to achieve linear speedup, we show that concurrent sending clients must be used, as opposed to sending large batch payloads parallelized across multiple GPUs. We show that client/server inferencing architectures add a considerable data transfer component that needs to be taken into consideration when benchmarking HPC system that benchmarks such as MLPerf do not measure. Finally, we investigate energy efficiency of GPUs for different levels of concurrency and batch sizes to report optimal configurations that minimize cost per inference.