LRMP：用于空间内存 DNN 加速器的混合精度层复制。

IF 3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Artificial Intelligence Pub Date : 2024-10-04 eCollection Date: 2024-01-01 DOI:10.3389/frai.2024.1268317

Abinand Nallathambi, Christin David Bose, Wilfried Haensch, Anand Raghunathan

{"title":"LRMP：用于空间内存 DNN 加速器的混合精度层复制。","authors":"Abinand Nallathambi, Christin David Bose, Wilfried Haensch, Anand Raghunathan","doi":"10.3389/frai.2024.1268317","DOIUrl":null,"url":null,"abstract":"In-memory computing (IMC) with non-volatile memories (NVMs) has emerged as a promising approach to address the rapidly growing computational demands of Deep Neural Networks (DNNs). Mapping DNN layers spatially onto NVM-based IMC accelerators achieves high degrees of parallelism. However, two challenges that arise in this approach are the highly non-uniform distribution of layer processing times and high area requirements. We propose LRMP, a method to jointly apply layer replication and mixed precision quantization to improve the performance of DNNs when mapped to area-constrained IMC accelerators. LRMP uses a combination of reinforcement learning and mixed integer linear programming to search the replication-quantization design space using a model that is closely informed by the target hardware architecture. Across five DNN benchmarks, LRMP achieves 2.6-9.3× latency and 8-18× throughput improvement at minimal (<1%) degradation in accuracy.","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1268317"},"PeriodicalIF":3.0000,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11486753/pdf/","citationCount":"0","resultStr":"{\"title\":\"LRMP: Layer Replication with Mixed Precision for spatial in-memory DNN accelerators.\",\"authors\":\"Abinand Nallathambi, Christin David Bose, Wilfried Haensch, Anand Raghunathan\",\"doi\":\"10.3389/frai.2024.1268317\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In-memory computing (IMC) with non-volatile memories (NVMs) has emerged as a promising approach to address the rapidly growing computational demands of Deep Neural Networks (DNNs). Mapping DNN layers spatially onto NVM-based IMC accelerators achieves high degrees of parallelism. However, two challenges that arise in this approach are the highly non-uniform distribution of layer processing times and high area requirements. We propose LRMP, a method to jointly apply layer replication and mixed precision quantization to improve the performance of DNNs when mapped to area-constrained IMC accelerators. LRMP uses a combination of reinforcement learning and mixed integer linear programming to search the replication-quantization design space using a model that is closely informed by the target hardware architecture. Across five DNN benchmarks, LRMP achieves 2.6-9.3× latency and 8-18× throughput improvement at minimal (<1%) degradation in accuracy.\",\"PeriodicalId\":33315,\"journal\":{\"name\":\"Frontiers in Artificial Intelligence\",\"volume\":\"7 \",\"pages\":\"1268317\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-10-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11486753/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/frai.2024.1268317\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frai.2024.1268317","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

采用非易失性存储器（NVM）的内存计算（IMC）已成为解决深度神经网络（DNN）快速增长的计算需求的一种有前途的方法。将 DNN 层空间映射到基于 NVM 的 IMC 加速器上可实现高度并行性。然而，这种方法面临两个挑战，一是层处理时间分布极不均匀，二是面积要求高。我们提出了 LRMP，一种联合应用层复制和混合精度量化的方法，以提高 DNN 映射到面积受限的 IMC 加速器上时的性能。LRMP 结合强化学习和混合整数线性编程，使用与目标硬件架构密切相关的模型搜索复制-量化设计空间。在五项 DNN 基准测试中，LRMP 以最小(0.1%)的速度实现了 2.6-9.3 倍的延迟和 8-18 倍的吞吐量改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

LRMP: Layer Replication with Mixed Precision for spatial in-memory DNN accelerators.

In-memory computing (IMC) with non-volatile memories (NVMs) has emerged as a promising approach to address the rapidly growing computational demands of Deep Neural Networks (DNNs). Mapping DNN layers spatially onto NVM-based IMC accelerators achieves high degrees of parallelism. However, two challenges that arise in this approach are the highly non-uniform distribution of layer processing times and high area requirements. We propose LRMP, a method to jointly apply layer replication and mixed precision quantization to improve the performance of DNNs when mapped to area-constrained IMC accelerators. LRMP uses a combination of reinforcement learning and mixed integer linear programming to search the replication-quantization design space using a model that is closely informed by the target hardware architecture. Across five DNN benchmarks, LRMP achieves 2.6-9.3× latency and 8-18× throughput improvement at minimal (<1%) degradation in accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊