{"title":"A Collaborative PIM Computing Optimization Framework for Multi-Tenant DNN","authors":"Bojing Li, Duo Zhong, Xiang Chen, Chenchen Liu","doi":"arxiv-2408.04812","DOIUrl":null,"url":null,"abstract":"Modern Artificial Intelligence (AI) applications are increasingly utilizing\nmulti-tenant deep neural networks (DNNs), which lead to a significant rise in\ncomputing complexity and the need for computing parallelism. ReRAM-based\nprocessing-in-memory (PIM) computing, with its high density and low power\nconsumption characteristics, holds promising potential for supporting the\ndeployment of multi-tenant DNNs. However, direct deployment of complex\nmulti-tenant DNNs on exsiting ReRAM-based PIM designs poses challenges.\nResource contention among different tenants can result in sever\nunder-utilization of on-chip computing resources. Moreover, area-intensive\noperators and computation-intensive operators require excessively large on-chip\nareas and long processing times, leading to high overall latency during\nparallel computing. To address these challenges, we propose a novel ReRAM-based\nin-memory computing framework that enables efficient deployment of multi-tenant\nDNNs on ReRAM-based PIM designs. Our approach tackles the resource contention\nproblems by iteratively partitioning the PIM hardware at tenant level. In\naddition, we construct a fine-grained reconstructed processing pipeline at the\noperator level to handle area-intensive operators. Compared to the direct\ndeployments on traditional ReRAM-based PIM designs, our proposed PIM computing\nframework achieves significant improvements in speed (ranges from 1.75x to\n60.43x) and energy(up to 1.89x).","PeriodicalId":501168,"journal":{"name":"arXiv - CS - Emerging Technologies","volume":"2011 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Emerging Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04812","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Modern Artificial Intelligence (AI) applications are increasingly utilizing
multi-tenant deep neural networks (DNNs), which lead to a significant rise in
computing complexity and the need for computing parallelism. ReRAM-based
processing-in-memory (PIM) computing, with its high density and low power
consumption characteristics, holds promising potential for supporting the
deployment of multi-tenant DNNs. However, direct deployment of complex
multi-tenant DNNs on exsiting ReRAM-based PIM designs poses challenges.
Resource contention among different tenants can result in sever
under-utilization of on-chip computing resources. Moreover, area-intensive
operators and computation-intensive operators require excessively large on-chip
areas and long processing times, leading to high overall latency during
parallel computing. To address these challenges, we propose a novel ReRAM-based
in-memory computing framework that enables efficient deployment of multi-tenant
DNNs on ReRAM-based PIM designs. Our approach tackles the resource contention
problems by iteratively partitioning the PIM hardware at tenant level. In
addition, we construct a fine-grained reconstructed processing pipeline at the
operator level to handle area-intensive operators. Compared to the direct
deployments on traditional ReRAM-based PIM designs, our proposed PIM computing
framework achieves significant improvements in speed (ranges from 1.75x to
60.43x) and energy(up to 1.89x).