MINOTAUR：基于位置的0.42-0.50-TOPS /W边缘变压器推理和训练加速器

IF 5.6 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Solid-state Circuits Pub Date : 2025-03-06 DOI:10.1109/JSSC.2025.3545731

Kartik Prabhu;Robert M. Radway;Jeffrey Yu;Kai Bartolone;Massimo Giordano;Fabian Peddinghaus;Yonatan Urman;Win-San Khwa;Yu-Der Chih;Meng-Fan Chang;Subhasish Mitra;Priyanka Raina

{"title":"MINOTAUR：基于位置的0.42-0.50-TOPS /W边缘变压器推理和训练加速器","authors":"Kartik Prabhu;Robert M. Radway;Jeffrey Yu;Kai Bartolone;Massimo Giordano;Fabian Peddinghaus;Yonatan Urman;Win-San Khwa;Yu-Der Chih;Meng-Fan Chang;Subhasish Mitra;Priyanka Raina","doi":"10.1109/JSSC.2025.3545731","DOIUrl":null,"url":null,"abstract":"Transformer models have revolutionized natural language processing (NLP) and enabled many new applications, but are challenging to deploy on resource-constrained edge devices due to their high computation and memory demands. We present MINOTAUR, an edge system-on-chip (SoC) for inference and fine-tuning of Transformer models with all memory on the chip. MINOTAUR utilizes a configurable 8-bit posit-based accelerator to achieve highly accurate and efficient inference and fine-tuning. To minimize memory power, MINOTAUR employs fine-grained spatiotemporal power gating of on-chip resistive-RAM (RRAM). MINOTAUR enables on-chip fine-tuning through full-network low-rank adaptation (LoRA). MINOTAUR fabricates in a 40-nm CMOS process, achieves ResNet-18 inference in 8.1 mJ and MobileBERTTINY inference in 8.2 mJ, and performs on-chip fine-tuning with an accuracy that is within 1.7% of offline training.","PeriodicalId":13129,"journal":{"name":"IEEE Journal of Solid-state Circuits","volume":"60 4","pages":"1311-1323"},"PeriodicalIF":5.6000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MINOTAUR: A Posit-Based 0.42–0.50-TOPS/W Edge Transformer Inference and Training Accelerator\",\"authors\":\"Kartik Prabhu;Robert M. Radway;Jeffrey Yu;Kai Bartolone;Massimo Giordano;Fabian Peddinghaus;Yonatan Urman;Win-San Khwa;Yu-Der Chih;Meng-Fan Chang;Subhasish Mitra;Priyanka Raina\",\"doi\":\"10.1109/JSSC.2025.3545731\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Transformer models have revolutionized natural language processing (NLP) and enabled many new applications, but are challenging to deploy on resource-constrained edge devices due to their high computation and memory demands. We present MINOTAUR, an edge system-on-chip (SoC) for inference and fine-tuning of Transformer models with all memory on the chip. MINOTAUR utilizes a configurable 8-bit posit-based accelerator to achieve highly accurate and efficient inference and fine-tuning. To minimize memory power, MINOTAUR employs fine-grained spatiotemporal power gating of on-chip resistive-RAM (RRAM). MINOTAUR enables on-chip fine-tuning through full-network low-rank adaptation (LoRA). MINOTAUR fabricates in a 40-nm CMOS process, achieves ResNet-18 inference in 8.1 mJ and MobileBERTTINY inference in 8.2 mJ, and performs on-chip fine-tuning with an accuracy that is within 1.7% of offline training.\",\"PeriodicalId\":13129,\"journal\":{\"name\":\"IEEE Journal of Solid-state Circuits\",\"volume\":\"60 4\",\"pages\":\"1311-1323\"},\"PeriodicalIF\":5.6000,\"publicationDate\":\"2025-03-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal of Solid-state Circuits\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10916649/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Solid-state Circuits","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10916649/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

Transformer模型已经彻底改变了自然语言处理（NLP）并启用了许多新的应用程序，但是由于其高计算和内存需求，在资源受限的边缘设备上部署是具有挑战性的。我们提出MINOTAUR，一种边缘片上系统（SoC），用于在芯片上具有所有内存的Transformer模型的推理和微调。MINOTAUR采用可配置的8位正极加速器来实现高精度和高效的推理和微调。为了最小化内存功耗，MINOTAUR采用了片上电阻式ram （RRAM）的细粒度时空功率门控。MINOTAUR通过全网低阶适应（LoRA）实现片上微调。MINOTAUR采用40纳米CMOS工艺制造，在8.1 mJ的时间内实现ResNet-18推理，在8.2 mJ的时间内实现MobileBERTTINY推理，并以离线训练的1.7%以内的精度进行片上微调。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MINOTAUR: A Posit-Based 0.42–0.50-TOPS/W Edge Transformer Inference and Training Accelerator

Transformer models have revolutionized natural language processing (NLP) and enabled many new applications, but are challenging to deploy on resource-constrained edge devices due to their high computation and memory demands. We present MINOTAUR, an edge system-on-chip (SoC) for inference and fine-tuning of Transformer models with all memory on the chip. MINOTAUR utilizes a configurable 8-bit posit-based accelerator to achieve highly accurate and efficient inference and fine-tuning. To minimize memory power, MINOTAUR employs fine-grained spatiotemporal power gating of on-chip resistive-RAM (RRAM). MINOTAUR enables on-chip fine-tuning through full-network low-rank adaptation (LoRA). MINOTAUR fabricates in a 40-nm CMOS process, achieves ResNet-18 inference in 8.1 mJ and MobileBERTTINY inference in 8.2 mJ, and performs on-chip fine-tuning with an accuracy that is within 1.7% of offline training.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Journal of Solid-state Circuits 工程技术-工程：电子与电气

CiteScore

11.00

自引率

20.40%

发文量

351

审稿时长

3-6 weeks

期刊介绍： The IEEE Journal of Solid-State Circuits publishes papers each month in the broad area of solid-state circuits with particular emphasis on transistor-level design of integrated circuits. It also provides coverage of topics such as circuits modeling, technology, systems design, layout, and testing that relate directly to IC design. Integrated circuits and VLSI are of principal interest; material related to discrete circuit design is seldom published. Experimental verification is strongly encouraged.