3.2 The A100 Datacenter GPU and Ampere Architecture

Jack Choquette, Ming-Ju Edward Lee, R. Krashinsky, V. Balan, Brucek Khailany
{"title":"3.2 The A100 Datacenter GPU and Ampere Architecture","authors":"Jack Choquette, Ming-Ju Edward Lee, R. Krashinsky, V. Balan, Brucek Khailany","doi":"10.1109/ISSCC42613.2021.9365803","DOIUrl":null,"url":null,"abstract":"The diversity of compute-intensive applications in modern cloud data centers has driven the explosion of GPU-accelerated cloud computing. Such applications include AI deep learning training and inference, data analytics, scientific computing, genomics, edge video analytics and 5G services, graphics rendering, and cloud gaming. The A100 GPU introduces several features targeting these workloads: a $3^{rd}-$generation Tensor Core with support for fine-grained sparsity, new BFIoat16 (BF16), TensorFIoat-32 (TF32), and FP64 datatypes, scale-out support with multi-instance GPU (MIG) virtualization, and scale-up support with a $3^{rd}-$generation 50Gbps NVLink I/0 interface (NVLink3) and NVSwitch inter-GPU communication. As shown in Fig. 3.2.1, A100 contains 108 Streaming Multiprocessors (SMs) and 6912 CUDA cores. The SMs are fed by a 40MB L2 cache and 1. 56TB/s of HBM2 memory bandwidth (BW). At 1.41GHz, A100 provides an effective peak 1248T0PS (8b integers), 624TFLOPS (FP16) and312TFLOPS (TF32) when including sparsity optimizations. Implemented in a TSMC 7nm N7 process, the A100 die (Fig. 3.2.7) contains 54B transistors and measures 826mm2.","PeriodicalId":371093,"journal":{"name":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSCC42613.2021.9365803","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19

Abstract

The diversity of compute-intensive applications in modern cloud data centers has driven the explosion of GPU-accelerated cloud computing. Such applications include AI deep learning training and inference, data analytics, scientific computing, genomics, edge video analytics and 5G services, graphics rendering, and cloud gaming. The A100 GPU introduces several features targeting these workloads: a $3^{rd}-$generation Tensor Core with support for fine-grained sparsity, new BFIoat16 (BF16), TensorFIoat-32 (TF32), and FP64 datatypes, scale-out support with multi-instance GPU (MIG) virtualization, and scale-up support with a $3^{rd}-$generation 50Gbps NVLink I/0 interface (NVLink3) and NVSwitch inter-GPU communication. As shown in Fig. 3.2.1, A100 contains 108 Streaming Multiprocessors (SMs) and 6912 CUDA cores. The SMs are fed by a 40MB L2 cache and 1. 56TB/s of HBM2 memory bandwidth (BW). At 1.41GHz, A100 provides an effective peak 1248T0PS (8b integers), 624TFLOPS (FP16) and312TFLOPS (TF32) when including sparsity optimizations. Implemented in a TSMC 7nm N7 process, the A100 die (Fig. 3.2.7) contains 54B transistors and measures 826mm2.
3.2 A100数据中心GPU与安培架构
现代云数据中心中计算密集型应用程序的多样性推动了gpu加速云计算的爆炸式增长。这些应用包括人工智能深度学习训练和推理、数据分析、科学计算、基因组学、边缘视频分析和5G服务、图形渲染和云游戏。A100 GPU针对这些工作负载引入了几个特性:支持细粒度稀疏的$3^{rd}-$代张量核心,支持新的BFIoat16 (BF16), tensorfio32 (TF32)和FP64数据类型,支持多实例GPU (MIG)虚拟化的横向扩展,以及支持$3^{rd}-$代50Gbps NVLink I/0接口(NVLink3)和NVSwitch GPU间通信的横向扩展。如图3.2.1所示,A100包含108个流式多处理器(SMs)和6912个CUDA内核。SMs由40MB二级缓存和1。56TB/s HBM2内存带宽(BW)。在1.41GHz时,A100提供1248T0PS (8b整数),624TFLOPS (FP16)和312tflops (TF32)的有效峰值,包括稀疏性优化。A100芯片采用台积电7nm N7工艺,包含54B个晶体管,尺寸为826mm2。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信