MINOTAUR:基于位置的0.42-0.50-TOPS /W边缘变压器推理和训练加速器

IF 4.6 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Kartik Prabhu;Robert M. Radway;Jeffrey Yu;Kai Bartolone;Massimo Giordano;Fabian Peddinghaus;Yonatan Urman;Win-San Khwa;Yu-Der Chih;Meng-Fan Chang;Subhasish Mitra;Priyanka Raina
{"title":"MINOTAUR:基于位置的0.42-0.50-TOPS /W边缘变压器推理和训练加速器","authors":"Kartik Prabhu;Robert M. Radway;Jeffrey Yu;Kai Bartolone;Massimo Giordano;Fabian Peddinghaus;Yonatan Urman;Win-San Khwa;Yu-Der Chih;Meng-Fan Chang;Subhasish Mitra;Priyanka Raina","doi":"10.1109/JSSC.2025.3545731","DOIUrl":null,"url":null,"abstract":"Transformer models have revolutionized natural language processing (NLP) and enabled many new applications, but are challenging to deploy on resource-constrained edge devices due to their high computation and memory demands. We present MINOTAUR, an edge system-on-chip (SoC) for inference and fine-tuning of Transformer models with all memory on the chip. MINOTAUR utilizes a configurable 8-bit posit-based accelerator to achieve highly accurate and efficient inference and fine-tuning. To minimize memory power, MINOTAUR employs fine-grained spatiotemporal power gating of on-chip resistive-RAM (RRAM). MINOTAUR enables on-chip fine-tuning through full-network low-rank adaptation (LoRA). MINOTAUR fabricates in a 40-nm CMOS process, achieves ResNet-18 inference in 8.1 mJ and MobileBERTTINY inference in 8.2 mJ, and performs on-chip fine-tuning with an accuracy that is within 1.7% of offline training.","PeriodicalId":13129,"journal":{"name":"IEEE Journal of Solid-state Circuits","volume":"60 4","pages":"1311-1323"},"PeriodicalIF":4.6000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MINOTAUR: A Posit-Based 0.42–0.50-TOPS/W Edge Transformer Inference and Training Accelerator\",\"authors\":\"Kartik Prabhu;Robert M. Radway;Jeffrey Yu;Kai Bartolone;Massimo Giordano;Fabian Peddinghaus;Yonatan Urman;Win-San Khwa;Yu-Der Chih;Meng-Fan Chang;Subhasish Mitra;Priyanka Raina\",\"doi\":\"10.1109/JSSC.2025.3545731\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Transformer models have revolutionized natural language processing (NLP) and enabled many new applications, but are challenging to deploy on resource-constrained edge devices due to their high computation and memory demands. We present MINOTAUR, an edge system-on-chip (SoC) for inference and fine-tuning of Transformer models with all memory on the chip. MINOTAUR utilizes a configurable 8-bit posit-based accelerator to achieve highly accurate and efficient inference and fine-tuning. To minimize memory power, MINOTAUR employs fine-grained spatiotemporal power gating of on-chip resistive-RAM (RRAM). MINOTAUR enables on-chip fine-tuning through full-network low-rank adaptation (LoRA). MINOTAUR fabricates in a 40-nm CMOS process, achieves ResNet-18 inference in 8.1 mJ and MobileBERTTINY inference in 8.2 mJ, and performs on-chip fine-tuning with an accuracy that is within 1.7% of offline training.\",\"PeriodicalId\":13129,\"journal\":{\"name\":\"IEEE Journal of Solid-state Circuits\",\"volume\":\"60 4\",\"pages\":\"1311-1323\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-03-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal of Solid-state Circuits\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10916649/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Solid-state Circuits","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10916649/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

Transformer模型已经彻底改变了自然语言处理(NLP)并启用了许多新的应用程序,但是由于其高计算和内存需求,在资源受限的边缘设备上部署是具有挑战性的。我们提出MINOTAUR,一种边缘片上系统(SoC),用于在芯片上具有所有内存的Transformer模型的推理和微调。MINOTAUR采用可配置的8位正极加速器来实现高精度和高效的推理和微调。为了最小化内存功耗,MINOTAUR采用了片上电阻式ram (RRAM)的细粒度时空功率门控。MINOTAUR通过全网低阶适应(LoRA)实现片上微调。MINOTAUR采用40纳米CMOS工艺制造,在8.1 mJ的时间内实现ResNet-18推理,在8.2 mJ的时间内实现MobileBERTTINY推理,并以离线训练的1.7%以内的精度进行片上微调。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
MINOTAUR: A Posit-Based 0.42–0.50-TOPS/W Edge Transformer Inference and Training Accelerator
Transformer models have revolutionized natural language processing (NLP) and enabled many new applications, but are challenging to deploy on resource-constrained edge devices due to their high computation and memory demands. We present MINOTAUR, an edge system-on-chip (SoC) for inference and fine-tuning of Transformer models with all memory on the chip. MINOTAUR utilizes a configurable 8-bit posit-based accelerator to achieve highly accurate and efficient inference and fine-tuning. To minimize memory power, MINOTAUR employs fine-grained spatiotemporal power gating of on-chip resistive-RAM (RRAM). MINOTAUR enables on-chip fine-tuning through full-network low-rank adaptation (LoRA). MINOTAUR fabricates in a 40-nm CMOS process, achieves ResNet-18 inference in 8.1 mJ and MobileBERTTINY inference in 8.2 mJ, and performs on-chip fine-tuning with an accuracy that is within 1.7% of offline training.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Journal of Solid-state Circuits
IEEE Journal of Solid-state Circuits 工程技术-工程:电子与电气
CiteScore
11.00
自引率
20.40%
发文量
351
审稿时长
3-6 weeks
期刊介绍: The IEEE Journal of Solid-State Circuits publishes papers each month in the broad area of solid-state circuits with particular emphasis on transistor-level design of integrated circuits. It also provides coverage of topics such as circuits modeling, technology, systems design, layout, and testing that relate directly to IC design. Integrated circuits and VLSI are of principal interest; material related to discrete circuit design is seldom published. Experimental verification is strongly encouraged.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信