AxLaM: energy-efficient accelerator design for language models for edge computing.

IF 4.3 3区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES
Tom Glint, Bhumika Mittal, Santripta Sharma, Abdul Qadir Ronak, Abhinav Goud, Neerja Kasture, Zaqi Momin, Aravind Krishna, Joycee Mekie
{"title":"AxLaM: energy-efficient accelerator design for language models for edge computing.","authors":"Tom Glint, Bhumika Mittal, Santripta Sharma, Abdul Qadir Ronak, Abhinav Goud, Neerja Kasture, Zaqi Momin, Aravind Krishna, Joycee Mekie","doi":"10.1098/rsta.2023.0395","DOIUrl":null,"url":null,"abstract":"<p><p>Modern language models such as bidirectional encoder representations from transformers have revolutionized natural language processing (NLP) tasks but are computationally intensive, limiting their deployment on edge devices. This paper presents an energy-efficient accelerator design tailored for encoder-based language models, enabling their integration into mobile and edge computing environments. A data-flow-aware hardware accelerator design for language models inspired by Simba, makes use of approximate fixed-point POSIT-based multipliers and uses high bandwidth memory (HBM) in achieving significant improvements in computational efficiency, power consumption, area and latency compared to the hardware-realized scalable accelerator Simba. Compared to Simba, AxLaM achieves a ninefold energy reduction, 58% area reduction and 1.2 times improved latency, making it suitable for deployment in edge devices. The energy efficiency of AxLaN is 1.8 TOPS/W, 65% higher than FACT, which requires pre-processing of the language model before implementing it on the hardware.This article is part of the theme issue 'Emerging technologies for future secure computing platforms'.</p>","PeriodicalId":19879,"journal":{"name":"Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences","volume":"383 2288","pages":"20230395"},"PeriodicalIF":4.3000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1098/rsta.2023.0395","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/16 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Modern language models such as bidirectional encoder representations from transformers have revolutionized natural language processing (NLP) tasks but are computationally intensive, limiting their deployment on edge devices. This paper presents an energy-efficient accelerator design tailored for encoder-based language models, enabling their integration into mobile and edge computing environments. A data-flow-aware hardware accelerator design for language models inspired by Simba, makes use of approximate fixed-point POSIT-based multipliers and uses high bandwidth memory (HBM) in achieving significant improvements in computational efficiency, power consumption, area and latency compared to the hardware-realized scalable accelerator Simba. Compared to Simba, AxLaM achieves a ninefold energy reduction, 58% area reduction and 1.2 times improved latency, making it suitable for deployment in edge devices. The energy efficiency of AxLaN is 1.8 TOPS/W, 65% higher than FACT, which requires pre-processing of the language model before implementing it on the hardware.This article is part of the theme issue 'Emerging technologies for future secure computing platforms'.

AxLaM:边缘计算语言模型的节能加速器设计。
现代语言模型,如来自变压器的双向编码器表示,已经彻底改变了自然语言处理(NLP)任务,但计算密集型,限制了它们在边缘设备上的部署。本文提出了一种针对基于编码器的语言模型定制的节能加速器设计,使其能够集成到移动和边缘计算环境中。受Simba启发,为语言模型设计了一种数据流感知硬件加速器,它使用近似定点的基于posit的乘法器,并使用高带宽内存(HBM),与硬件实现的可扩展加速器Simba相比,在计算效率、功耗、面积和延迟方面取得了显著改善。与Simba相比,AxLaM的能耗降低了9倍,面积减少了58%,延迟提高了1.2倍,适合部署在边缘设备中。AxLaN的能效为1.8 TOPS/W,比FACT高65%,但在硬件上实现之前需要对语言模型进行预处理。本文是“未来安全计算平台的新兴技术”主题的一部分。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
9.30
自引率
2.00%
发文量
367
审稿时长
3 months
期刊介绍: Continuing its long history of influential scientific publishing, Philosophical Transactions A publishes high-quality theme issues on topics of current importance and general interest within the physical, mathematical and engineering sciences, guest-edited by leading authorities and comprising new research, reviews and opinions from prominent researchers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信