LRMP:用于空间内存 DNN 加速器的混合精度层复制。

IF 3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Frontiers in Artificial Intelligence Pub Date : 2024-10-04 eCollection Date: 2024-01-01 DOI:10.3389/frai.2024.1268317
Abinand Nallathambi, Christin David Bose, Wilfried Haensch, Anand Raghunathan
{"title":"LRMP:用于空间内存 DNN 加速器的混合精度层复制。","authors":"Abinand Nallathambi, Christin David Bose, Wilfried Haensch, Anand Raghunathan","doi":"10.3389/frai.2024.1268317","DOIUrl":null,"url":null,"abstract":"<p><p>In-memory computing (IMC) with non-volatile memories (NVMs) has emerged as a promising approach to address the rapidly growing computational demands of Deep Neural Networks (DNNs). Mapping DNN layers spatially onto NVM-based IMC accelerators achieves high degrees of parallelism. However, two challenges that arise in this approach are the highly non-uniform distribution of layer processing times and high area requirements. We propose LRMP, a method to jointly apply layer replication and mixed precision quantization to improve the performance of DNNs when mapped to area-constrained IMC accelerators. LRMP uses a combination of reinforcement learning and mixed integer linear programming to search the replication-quantization design space using a model that is closely informed by the target hardware architecture. Across five DNN benchmarks, LRMP achieves 2.6-9.3× latency and 8-18× throughput improvement at minimal (<1%) degradation in accuracy.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1268317"},"PeriodicalIF":3.0000,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11486753/pdf/","citationCount":"0","resultStr":"{\"title\":\"LRMP: Layer Replication with Mixed Precision for spatial in-memory DNN accelerators.\",\"authors\":\"Abinand Nallathambi, Christin David Bose, Wilfried Haensch, Anand Raghunathan\",\"doi\":\"10.3389/frai.2024.1268317\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>In-memory computing (IMC) with non-volatile memories (NVMs) has emerged as a promising approach to address the rapidly growing computational demands of Deep Neural Networks (DNNs). Mapping DNN layers spatially onto NVM-based IMC accelerators achieves high degrees of parallelism. However, two challenges that arise in this approach are the highly non-uniform distribution of layer processing times and high area requirements. We propose LRMP, a method to jointly apply layer replication and mixed precision quantization to improve the performance of DNNs when mapped to area-constrained IMC accelerators. LRMP uses a combination of reinforcement learning and mixed integer linear programming to search the replication-quantization design space using a model that is closely informed by the target hardware architecture. Across five DNN benchmarks, LRMP achieves 2.6-9.3× latency and 8-18× throughput improvement at minimal (<1%) degradation in accuracy.</p>\",\"PeriodicalId\":33315,\"journal\":{\"name\":\"Frontiers in Artificial Intelligence\",\"volume\":\"7 \",\"pages\":\"1268317\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-10-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11486753/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/frai.2024.1268317\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frai.2024.1268317","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

采用非易失性存储器(NVM)的内存计算(IMC)已成为解决深度神经网络(DNN)快速增长的计算需求的一种有前途的方法。将 DNN 层空间映射到基于 NVM 的 IMC 加速器上可实现高度并行性。然而,这种方法面临两个挑战,一是层处理时间分布极不均匀,二是面积要求高。我们提出了 LRMP,一种联合应用层复制和混合精度量化的方法,以提高 DNN 映射到面积受限的 IMC 加速器上时的性能。LRMP 结合强化学习和混合整数线性编程,使用与目标硬件架构密切相关的模型搜索复制-量化设计空间。在五项 DNN 基准测试中,LRMP 以最小(0.1%)的速度实现了 2.6-9.3 倍的延迟和 8-18 倍的吞吐量改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
LRMP: Layer Replication with Mixed Precision for spatial in-memory DNN accelerators.

In-memory computing (IMC) with non-volatile memories (NVMs) has emerged as a promising approach to address the rapidly growing computational demands of Deep Neural Networks (DNNs). Mapping DNN layers spatially onto NVM-based IMC accelerators achieves high degrees of parallelism. However, two challenges that arise in this approach are the highly non-uniform distribution of layer processing times and high area requirements. We propose LRMP, a method to jointly apply layer replication and mixed precision quantization to improve the performance of DNNs when mapped to area-constrained IMC accelerators. LRMP uses a combination of reinforcement learning and mixed integer linear programming to search the replication-quantization design space using a model that is closely informed by the target hardware architecture. Across five DNN benchmarks, LRMP achieves 2.6-9.3× latency and 8-18× throughput improvement at minimal (<1%) degradation in accuracy.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.10
自引率
2.50%
发文量
272
审稿时长
13 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信