H3D-Transformer:在边缘设备上加速变压器模型的异构三维(H3D)计算平台

IF 2.2 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Yandong Luo, Shimeng Yu
{"title":"H3D-Transformer:在边缘设备上加速变压器模型的异构三维(H3D)计算平台","authors":"Yandong Luo, Shimeng Yu","doi":"10.1145/3649219","DOIUrl":null,"url":null,"abstract":"<p>Prior hardware accelerator designs primarily focused on single-chip solutions for 10MB-class computer vision models. The GB-class transformer models for natural language processing (NLP) impose challenges on existing accelerator design due to the massive number of parameters and the diverse matrix multiplication (MatMul) workloads involved. This work proposes a heterogeneous 3D-based accelerator design for transformer models, which adopts an interposer substrate with multiple 3D memory/logic hybrid cubes optimized for accelerating different MatMul workloads. An approximate computing scheme is proposed to take advantage of heterogeneous computing paradigms of mixed-signal compute-in-memory (CIM) and digital tensor processing units (TPU). From the system-level evaluation results, 10 TOPS/W energy efficiency is achieved for the Bert and GPT2 model, which is about 2.6 × ∼ 3.1 × higher than the baseline with 7nm TPU and stacked FeFET memory.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":2.2000,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"H3D-Transformer: A Heterogeneous 3D (H3D) Computing Platform for Transformer Model Acceleration on Edge Devices\",\"authors\":\"Yandong Luo, Shimeng Yu\",\"doi\":\"10.1145/3649219\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Prior hardware accelerator designs primarily focused on single-chip solutions for 10MB-class computer vision models. The GB-class transformer models for natural language processing (NLP) impose challenges on existing accelerator design due to the massive number of parameters and the diverse matrix multiplication (MatMul) workloads involved. This work proposes a heterogeneous 3D-based accelerator design for transformer models, which adopts an interposer substrate with multiple 3D memory/logic hybrid cubes optimized for accelerating different MatMul workloads. An approximate computing scheme is proposed to take advantage of heterogeneous computing paradigms of mixed-signal compute-in-memory (CIM) and digital tensor processing units (TPU). From the system-level evaluation results, 10 TOPS/W energy efficiency is achieved for the Bert and GPT2 model, which is about 2.6 × ∼ 3.1 × higher than the baseline with 7nm TPU and stacked FeFET memory.</p>\",\"PeriodicalId\":50944,\"journal\":{\"name\":\"ACM Transactions on Design Automation of Electronic Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2024-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Design Automation of Electronic Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3649219\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Design Automation of Electronic Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3649219","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

先前的硬件加速器设计主要集中于 10MB 级计算机视觉模型的单芯片解决方案。用于自然语言处理(NLP)的 GB 级转换器模型由于涉及大量参数和多种矩阵乘法(MatMul)工作负载,给现有加速器设计带来了挑战。这项工作针对变换器模型提出了一种基于三维的异构加速器设计,它采用了带有多个三维内存/逻辑混合立方体的内插基板,并针对不同的 MatMul 工作负载进行了优化。利用混合信号内存计算(CIM)和数字张量处理单元(TPU)的异构计算范例,提出了一种近似计算方案。从系统级评估结果来看,Bert 和 GPT2 模型实现了 10 TOPS/W 的能效,比采用 7nm TPU 和堆叠 FeFET 内存的基准高出约 2.6 × ∼ 3.1 ×。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
H3D-Transformer: A Heterogeneous 3D (H3D) Computing Platform for Transformer Model Acceleration on Edge Devices

Prior hardware accelerator designs primarily focused on single-chip solutions for 10MB-class computer vision models. The GB-class transformer models for natural language processing (NLP) impose challenges on existing accelerator design due to the massive number of parameters and the diverse matrix multiplication (MatMul) workloads involved. This work proposes a heterogeneous 3D-based accelerator design for transformer models, which adopts an interposer substrate with multiple 3D memory/logic hybrid cubes optimized for accelerating different MatMul workloads. An approximate computing scheme is proposed to take advantage of heterogeneous computing paradigms of mixed-signal compute-in-memory (CIM) and digital tensor processing units (TPU). From the system-level evaluation results, 10 TOPS/W energy efficiency is achieved for the Bert and GPT2 model, which is about 2.6 × ∼ 3.1 × higher than the baseline with 7nm TPU and stacked FeFET memory.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
ACM Transactions on Design Automation of Electronic Systems
ACM Transactions on Design Automation of Electronic Systems 工程技术-计算机:软件工程
CiteScore
3.20
自引率
7.10%
发文量
105
审稿时长
3 months
期刊介绍: TODAES is a premier ACM journal in design and automation of electronic systems. It publishes innovative work documenting significant research and development advances on the specification, design, analysis, simulation, testing, and evaluation of electronic systems, emphasizing a computer science/engineering orientation. Both theoretical analysis and practical solutions are welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信