A Collaborative PIM Computing Optimization Framework for Multi-Tenant DNN

Bojing Li, Duo Zhong, Xiang Chen, Chenchen Liu
{"title":"A Collaborative PIM Computing Optimization Framework for Multi-Tenant DNN","authors":"Bojing Li, Duo Zhong, Xiang Chen, Chenchen Liu","doi":"arxiv-2408.04812","DOIUrl":null,"url":null,"abstract":"Modern Artificial Intelligence (AI) applications are increasingly utilizing\nmulti-tenant deep neural networks (DNNs), which lead to a significant rise in\ncomputing complexity and the need for computing parallelism. ReRAM-based\nprocessing-in-memory (PIM) computing, with its high density and low power\nconsumption characteristics, holds promising potential for supporting the\ndeployment of multi-tenant DNNs. However, direct deployment of complex\nmulti-tenant DNNs on exsiting ReRAM-based PIM designs poses challenges.\nResource contention among different tenants can result in sever\nunder-utilization of on-chip computing resources. Moreover, area-intensive\noperators and computation-intensive operators require excessively large on-chip\nareas and long processing times, leading to high overall latency during\nparallel computing. To address these challenges, we propose a novel ReRAM-based\nin-memory computing framework that enables efficient deployment of multi-tenant\nDNNs on ReRAM-based PIM designs. Our approach tackles the resource contention\nproblems by iteratively partitioning the PIM hardware at tenant level. In\naddition, we construct a fine-grained reconstructed processing pipeline at the\noperator level to handle area-intensive operators. Compared to the direct\ndeployments on traditional ReRAM-based PIM designs, our proposed PIM computing\nframework achieves significant improvements in speed (ranges from 1.75x to\n60.43x) and energy(up to 1.89x).","PeriodicalId":501168,"journal":{"name":"arXiv - CS - Emerging Technologies","volume":"2011 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Emerging Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04812","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Modern Artificial Intelligence (AI) applications are increasingly utilizing multi-tenant deep neural networks (DNNs), which lead to a significant rise in computing complexity and the need for computing parallelism. ReRAM-based processing-in-memory (PIM) computing, with its high density and low power consumption characteristics, holds promising potential for supporting the deployment of multi-tenant DNNs. However, direct deployment of complex multi-tenant DNNs on exsiting ReRAM-based PIM designs poses challenges. Resource contention among different tenants can result in sever under-utilization of on-chip computing resources. Moreover, area-intensive operators and computation-intensive operators require excessively large on-chip areas and long processing times, leading to high overall latency during parallel computing. To address these challenges, we propose a novel ReRAM-based in-memory computing framework that enables efficient deployment of multi-tenant DNNs on ReRAM-based PIM designs. Our approach tackles the resource contention problems by iteratively partitioning the PIM hardware at tenant level. In addition, we construct a fine-grained reconstructed processing pipeline at the operator level to handle area-intensive operators. Compared to the direct deployments on traditional ReRAM-based PIM designs, our proposed PIM computing framework achieves significant improvements in speed (ranges from 1.75x to 60.43x) and energy(up to 1.89x).
多租户 DNN 的 PIM 协同计算优化框架
现代人工智能(AI)应用越来越多地使用多租户深度神经网络(DNN),这导致计算复杂度和计算并行性需求大幅上升。基于ReRAM的内存处理(PIM)计算具有高密度和低功耗的特点,在支持多租户深度神经网络部署方面具有广阔的前景。然而,在现有的基于 ReRAM 的 PIM 设计上直接部署复杂的多租户 DNN 会带来挑战。此外,面积密集型运算器和计算密集型运算器需要过大的片上面积和过长的处理时间,从而导致并行计算过程中的整体延迟过高。为了应对这些挑战,我们提出了一种新颖的基于 ReRAM 的内存计算框架,可以在基于 ReRAM 的 PIM 设计上高效部署多租户 DNN。我们的方法通过在租户级别对 PIM 硬件进行迭代分区来解决资源争用问题。此外,我们还在操作员级别构建了细粒度重构处理流水线,以处理区域密集型操作员。与基于 ReRAM 的传统 PIM 设计上的直接部署相比,我们提出的 PIM 计算框架在速度(1.75 倍至 60.43 倍)和能耗(高达 1.89 倍)方面实现了显著改善。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信