A Collaborative PIM Computing Optimization Framework for Multi-Tenant DNN

arXiv - CS - Emerging Technologies Pub Date : 2024-08-09 DOI:arxiv-2408.04812

Bojing Li, Duo Zhong, Xiang Chen, Chenchen Liu

{"title":"A Collaborative PIM Computing Optimization Framework for Multi-Tenant DNN","authors":"Bojing Li, Duo Zhong, Xiang Chen, Chenchen Liu","doi":"arxiv-2408.04812","DOIUrl":null,"url":null,"abstract":"Modern Artificial Intelligence (AI) applications are increasingly utilizing\nmulti-tenant deep neural networks (DNNs), which lead to a significant rise in\ncomputing complexity and the need for computing parallelism. ReRAM-based\nprocessing-in-memory (PIM) computing, with its high density and low power\nconsumption characteristics, holds promising potential for supporting the\ndeployment of multi-tenant DNNs. However, direct deployment of complex\nmulti-tenant DNNs on exsiting ReRAM-based PIM designs poses challenges.\nResource contention among different tenants can result in sever\nunder-utilization of on-chip computing resources. Moreover, area-intensive\noperators and computation-intensive operators require excessively large on-chip\nareas and long processing times, leading to high overall latency during\nparallel computing. To address these challenges, we propose a novel ReRAM-based\nin-memory computing framework that enables efficient deployment of multi-tenant\nDNNs on ReRAM-based PIM designs. Our approach tackles the resource contention\nproblems by iteratively partitioning the PIM hardware at tenant level. In\naddition, we construct a fine-grained reconstructed processing pipeline at the\noperator level to handle area-intensive operators. Compared to the direct\ndeployments on traditional ReRAM-based PIM designs, our proposed PIM computing\nframework achieves significant improvements in speed (ranges from 1.75x to\n60.43x) and energy(up to 1.89x).","PeriodicalId":501168,"journal":{"name":"arXiv - CS - Emerging Technologies","volume":"2011 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Emerging Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04812","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Modern Artificial Intelligence (AI) applications are increasingly utilizing multi-tenant deep neural networks (DNNs), which lead to a significant rise in computing complexity and the need for computing parallelism. ReRAM-based processing-in-memory (PIM) computing, with its high density and low power consumption characteristics, holds promising potential for supporting the deployment of multi-tenant DNNs. However, direct deployment of complex multi-tenant DNNs on exsiting ReRAM-based PIM designs poses challenges. Resource contention among different tenants can result in sever under-utilization of on-chip computing resources. Moreover, area-intensive operators and computation-intensive operators require excessively large on-chip areas and long processing times, leading to high overall latency during parallel computing. To address these challenges, we propose a novel ReRAM-based in-memory computing framework that enables efficient deployment of multi-tenant DNNs on ReRAM-based PIM designs. Our approach tackles the resource contention problems by iteratively partitioning the PIM hardware at tenant level. In addition, we construct a fine-grained reconstructed processing pipeline at the operator level to handle area-intensive operators. Compared to the direct deployments on traditional ReRAM-based PIM designs, our proposed PIM computing framework achieves significant improvements in speed (ranges from 1.75x to 60.43x) and energy(up to 1.89x).

查看原文本刊更多论文

多租户 DNN 的 PIM 协同计算优化框架

现代人工智能（AI）应用越来越多地使用多租户深度神经网络（DNN），这导致计算复杂度和计算并行性需求大幅上升。基于ReRAM的内存处理（PIM）计算具有高密度和低功耗的特点，在支持多租户深度神经网络部署方面具有广阔的前景。然而，在现有的基于 ReRAM 的 PIM 设计上直接部署复杂的多租户 DNN 会带来挑战。此外，面积密集型运算器和计算密集型运算器需要过大的片上面积和过长的处理时间，从而导致并行计算过程中的整体延迟过高。为了应对这些挑战，我们提出了一种新颖的基于 ReRAM 的内存计算框架，可以在基于 ReRAM 的 PIM 设计上高效部署多租户 DNN。我们的方法通过在租户级别对 PIM 硬件进行迭代分区来解决资源争用问题。此外，我们还在操作员级别构建了细粒度重构处理流水线，以处理区域密集型操作员。与基于 ReRAM 的传统 PIM 设计上的直接部署相比，我们提出的 PIM 计算框架在速度（1.75 倍至 60.43 倍）和能耗（高达 1.89 倍）方面实现了显著改善。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Emerging Technologies

自引率

0.00%

发文量