{"title":"Arm Neoverse N2: Arm’s 2nd generation high performance infrastructure CPUs and system IPs","authors":"Andrea Pellegrini","doi":"10.1109/HCS52781.2021.9567483","DOIUrl":"https://doi.org/10.1109/HCS52781.2021.9567483","url":null,"abstract":"This benchmark presentation made by Arm Ltd and its subsidiaries (Arm) contains forward-looking statements and information. The information contained herein is therefore provided by Arm on an “as-is“ basis without warranty or liability of any kind. While Arm has made every attempt to ensure that the information contained in the benchmark presentation is accurate and reliable at the time of its publication, it cannot accept responsibility for any errors, omissions or inaccuracies or for the results obtained from the use of such information and should be used for guidance purposes only and is not intended to replace discussions with a duly appointed representative of Arm. Any results or comparisons shown are for general information purposes only and any particular data or analysis should not be interpreted as demonstrating a cause and effect relationship. Comparable performance on any performance indicator does not guarantee comparable performance on any other performance indicator.","PeriodicalId":246531,"journal":{"name":"2021 IEEE Hot Chips 33 Symposium (HCS)","volume":"686 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122977633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kota Ando, Jaehoon Yu, Kazutoshi Hirose, Hiroki Nakahara, Kazushi Kawamura, Thiem Van Chu, M. Motomura
{"title":"Edge Inference Engine for Deep & Random Sparse Neural Networks with 4-bit Cartesian-Product MAC Array and Pipelined Activation Aligner","authors":"Kota Ando, Jaehoon Yu, Kazutoshi Hirose, Hiroki Nakahara, Kazushi Kawamura, Thiem Van Chu, M. Motomura","doi":"10.1109/HCS52781.2021.9567328","DOIUrl":"https://doi.org/10.1109/HCS52781.2021.9567328","url":null,"abstract":"A 4b-quantized convolutional neural network (CNN) inference engine for edge-AI is presented featuring a Cartesian-product MAC array and pipelined activation aligners targeting deep-/random-pruned models. A 40nm prototype with 32x32 MACs and 5Mb SRAM runs at 534 MHz, 1.07 TOPS, 352 mW at 1.1V, and attains 5.30 dense TOPS/W, 234 MHz at 0.8V. Sparse TOPS/W reaches 26.5 when running a randomly pruned model (after 88% pruning). Training algorithms for obtaining highly efficient sparse/quantized models are also proposed.","PeriodicalId":246531,"journal":{"name":"2021 IEEE Hot Chips 33 Symposium (HCS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129796048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ENIAD: A Reconfigurable Near-data Processing Architecture for Web-Scale AI-enriched Big Data Service","authors":"Jialiang Zhang, JingJane Li","doi":"10.1109/HCS52781.2021.9567229","DOIUrl":"https://doi.org/10.1109/HCS52781.2021.9567229","url":null,"abstract":"To meet the surging demands required by AI-enriched Big Data services, cloud vendors are turning toward domain specific accelerators for improved efficiency, scalability and performance. ENIAD, the first end-to-end infrastructure for AI-enriched Big Data serving in real time, accelerates both deep neural network inferencing and billion-scale indexing at the data-center scale. Exploiting near- data computation, reconfigurable computing and rapid/agile hardware deployment flow, ENIAD serves state-of-the-art, online built indexing service with high efficiency at low batch sizes. A high-performance, index (data)-adaptable FPGA soft processor is at the heart of the system and able to serve 10x larger index size with 14x lower latency compared to state-of-the-art CPU and GPU architectures.","PeriodicalId":246531,"journal":{"name":"2021 IEEE Hot Chips 33 Symposium (HCS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121809756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Qualcomm® Cloud Al 100 : 12TOPS/W Scalable, High Performance and Low Latency Deep Learning Inference Accelerator","authors":"K. Chatha","doi":"10.1109/HCS52781.2021.9567417","DOIUrl":"https://doi.org/10.1109/HCS52781.2021.9567417","url":null,"abstract":"Future of Al in Data Center Demands Breakthrough Technology Compute power not keeping up with business needs to deliver best in class services Al Ubiquitous in Data Center, • Al fundamental, for next gen business analytics for bestcustomer experience and insights • Velocity of insight key to business leadership","PeriodicalId":246531,"journal":{"name":"2021 IEEE Hot Chips 33 Symposium (HCS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133257109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Brossollet, Alessandro Cappelli, I. Carron, Charidimos Chaintoutis, A. Chatelain, L. Daudet, S. Gigan, Daniel Hesslow, F. Krzakala, Julien Launay, Safa Mokaadi, F. Moreau, Kilian Muller, Ruben Ohana, Gustave Pariente, Iacopo Poli, G. L. Tommasone
{"title":"LightOn Optical Processing Unit : Scaling-up AI and HPC with a Non von Neumann co-processor","authors":"C. Brossollet, Alessandro Cappelli, I. Carron, Charidimos Chaintoutis, A. Chatelain, L. Daudet, S. Gigan, Daniel Hesslow, F. Krzakala, Julien Launay, Safa Mokaadi, F. Moreau, Kilian Muller, Ruben Ohana, Gustave Pariente, Iacopo Poli, G. L. Tommasone","doi":"10.1109/HCS52781.2021.9567166","DOIUrl":"https://doi.org/10.1109/HCS52781.2021.9567166","url":null,"abstract":"Beyond pure Von Neumann processing Scalability of AI / HPC models is limited by the Von Neumann bottleneck for accessing massive amounts of memory, driving up power consumption.","PeriodicalId":246531,"journal":{"name":"2021 IEEE Hot Chips 33 Symposium (HCS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131503276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Hesslow, Alessandro Cappelli, I. Carron, L. Daudet, Raphael Lafargue, Kilian Muller, Ruben Ohana, Gustave Pariente, Iacopo Poli
{"title":"Photonic co-processors in HPC: Using LightOn OPUs for Randomized Numerical Linear Algebra","authors":"Daniel Hesslow, Alessandro Cappelli, I. Carron, L. Daudet, Raphael Lafargue, Kilian Muller, Ruben Ohana, Gustave Pariente, Iacopo Poli","doi":"10.1109/HCS52781.2021.9566948","DOIUrl":"https://doi.org/10.1109/HCS52781.2021.9566948","url":null,"abstract":"Exact computation of linear algebra operations is challenging or even impossible at extreme scale By leveraging randomization we can get approximate results at reduced computational cost Lighton OPU: The first commercially available photonic Co-Processor","PeriodicalId":246531,"journal":{"name":"2021 IEEE Hot Chips 33 Symposium (HCS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114289319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}