Durgadoss, Kausik Maiti, Sanju C Sudhakaran, Isha Agarwal, Kartik Podugu, Pavan Kumar, Jitender Patil, A. Chawla
{"title":"Optimizing Convolutions for an Inference Accelerator: Case Study: Intel’s NNP-I 1000 DL Compute Grid","authors":"Durgadoss, Kausik Maiti, Sanju C Sudhakaran, Isha Agarwal, Kartik Podugu, Pavan Kumar, Jitender Patil, A. Chawla","doi":"10.1145/3486001.3486239","DOIUrl":null,"url":null,"abstract":"With Deep Learning (DL) surpassing humans in Image Recognition and Machine Translation related tasks, the demand for specialized hardware has increased in the recent past. DL Accelerators belong to a category of such purpose-built hardware that promise compelling performance for Neural Net computations. But a specialized hardware needs a powerful compiler to unlock its full potential. This paper discusses the Code Generator and Optimizer (CGO) that produces optimized tiling as well as schedule of Convolution operations for the DL Compute Grid in Intel’s NNP-I 1000 platform. This paper also presents some of the key optimization techniques used and the associated performance gains across a rich variety of Deep Learning workloads.","PeriodicalId":266754,"journal":{"name":"Proceedings of the First International Conference on AI-ML Systems","volume":"61 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the First International Conference on AI-ML Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3486001.3486239","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With Deep Learning (DL) surpassing humans in Image Recognition and Machine Translation related tasks, the demand for specialized hardware has increased in the recent past. DL Accelerators belong to a category of such purpose-built hardware that promise compelling performance for Neural Net computations. But a specialized hardware needs a powerful compiler to unlock its full potential. This paper discusses the Code Generator and Optimizer (CGO) that produces optimized tiling as well as schedule of Convolution operations for the DL Compute Grid in Intel’s NNP-I 1000 platform. This paper also presents some of the key optimization techniques used and the associated performance gains across a rich variety of Deep Learning workloads.