Durgadoss, Kausik Maiti, Sanju C Sudhakaran, Isha Agarwal, Kartik Podugu, Pavan Kumar, Jitender Patil, A. Chawla
{"title":"为推理加速器优化卷积:案例研究:英特尔的NNP-I 1000 DL计算网格","authors":"Durgadoss, Kausik Maiti, Sanju C Sudhakaran, Isha Agarwal, Kartik Podugu, Pavan Kumar, Jitender Patil, A. Chawla","doi":"10.1145/3486001.3486239","DOIUrl":null,"url":null,"abstract":"With Deep Learning (DL) surpassing humans in Image Recognition and Machine Translation related tasks, the demand for specialized hardware has increased in the recent past. DL Accelerators belong to a category of such purpose-built hardware that promise compelling performance for Neural Net computations. But a specialized hardware needs a powerful compiler to unlock its full potential. This paper discusses the Code Generator and Optimizer (CGO) that produces optimized tiling as well as schedule of Convolution operations for the DL Compute Grid in Intel’s NNP-I 1000 platform. This paper also presents some of the key optimization techniques used and the associated performance gains across a rich variety of Deep Learning workloads.","PeriodicalId":266754,"journal":{"name":"Proceedings of the First International Conference on AI-ML Systems","volume":"61 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimizing Convolutions for an Inference Accelerator: Case Study: Intel’s NNP-I 1000 DL Compute Grid\",\"authors\":\"Durgadoss, Kausik Maiti, Sanju C Sudhakaran, Isha Agarwal, Kartik Podugu, Pavan Kumar, Jitender Patil, A. Chawla\",\"doi\":\"10.1145/3486001.3486239\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With Deep Learning (DL) surpassing humans in Image Recognition and Machine Translation related tasks, the demand for specialized hardware has increased in the recent past. DL Accelerators belong to a category of such purpose-built hardware that promise compelling performance for Neural Net computations. But a specialized hardware needs a powerful compiler to unlock its full potential. This paper discusses the Code Generator and Optimizer (CGO) that produces optimized tiling as well as schedule of Convolution operations for the DL Compute Grid in Intel’s NNP-I 1000 platform. This paper also presents some of the key optimization techniques used and the associated performance gains across a rich variety of Deep Learning workloads.\",\"PeriodicalId\":266754,\"journal\":{\"name\":\"Proceedings of the First International Conference on AI-ML Systems\",\"volume\":\"61 2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the First International Conference on AI-ML Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3486001.3486239\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the First International Conference on AI-ML Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3486001.3486239","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Optimizing Convolutions for an Inference Accelerator: Case Study: Intel’s NNP-I 1000 DL Compute Grid
With Deep Learning (DL) surpassing humans in Image Recognition and Machine Translation related tasks, the demand for specialized hardware has increased in the recent past. DL Accelerators belong to a category of such purpose-built hardware that promise compelling performance for Neural Net computations. But a specialized hardware needs a powerful compiler to unlock its full potential. This paper discusses the Code Generator and Optimizer (CGO) that produces optimized tiling as well as schedule of Convolution operations for the DL Compute Grid in Intel’s NNP-I 1000 platform. This paper also presents some of the key optimization techniques used and the associated performance gains across a rich variety of Deep Learning workloads.