{"title":"多租户 DNN 的 PIM 协同计算优化框架","authors":"Bojing Li, Duo Zhong, Xiang Chen, Chenchen Liu","doi":"arxiv-2408.04812","DOIUrl":null,"url":null,"abstract":"Modern Artificial Intelligence (AI) applications are increasingly utilizing\nmulti-tenant deep neural networks (DNNs), which lead to a significant rise in\ncomputing complexity and the need for computing parallelism. ReRAM-based\nprocessing-in-memory (PIM) computing, with its high density and low power\nconsumption characteristics, holds promising potential for supporting the\ndeployment of multi-tenant DNNs. However, direct deployment of complex\nmulti-tenant DNNs on exsiting ReRAM-based PIM designs poses challenges.\nResource contention among different tenants can result in sever\nunder-utilization of on-chip computing resources. Moreover, area-intensive\noperators and computation-intensive operators require excessively large on-chip\nareas and long processing times, leading to high overall latency during\nparallel computing. To address these challenges, we propose a novel ReRAM-based\nin-memory computing framework that enables efficient deployment of multi-tenant\nDNNs on ReRAM-based PIM designs. Our approach tackles the resource contention\nproblems by iteratively partitioning the PIM hardware at tenant level. In\naddition, we construct a fine-grained reconstructed processing pipeline at the\noperator level to handle area-intensive operators. Compared to the direct\ndeployments on traditional ReRAM-based PIM designs, our proposed PIM computing\nframework achieves significant improvements in speed (ranges from 1.75x to\n60.43x) and energy(up to 1.89x).","PeriodicalId":501168,"journal":{"name":"arXiv - CS - Emerging Technologies","volume":"2011 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Collaborative PIM Computing Optimization Framework for Multi-Tenant DNN\",\"authors\":\"Bojing Li, Duo Zhong, Xiang Chen, Chenchen Liu\",\"doi\":\"arxiv-2408.04812\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern Artificial Intelligence (AI) applications are increasingly utilizing\\nmulti-tenant deep neural networks (DNNs), which lead to a significant rise in\\ncomputing complexity and the need for computing parallelism. ReRAM-based\\nprocessing-in-memory (PIM) computing, with its high density and low power\\nconsumption characteristics, holds promising potential for supporting the\\ndeployment of multi-tenant DNNs. However, direct deployment of complex\\nmulti-tenant DNNs on exsiting ReRAM-based PIM designs poses challenges.\\nResource contention among different tenants can result in sever\\nunder-utilization of on-chip computing resources. Moreover, area-intensive\\noperators and computation-intensive operators require excessively large on-chip\\nareas and long processing times, leading to high overall latency during\\nparallel computing. To address these challenges, we propose a novel ReRAM-based\\nin-memory computing framework that enables efficient deployment of multi-tenant\\nDNNs on ReRAM-based PIM designs. Our approach tackles the resource contention\\nproblems by iteratively partitioning the PIM hardware at tenant level. In\\naddition, we construct a fine-grained reconstructed processing pipeline at the\\noperator level to handle area-intensive operators. Compared to the direct\\ndeployments on traditional ReRAM-based PIM designs, our proposed PIM computing\\nframework achieves significant improvements in speed (ranges from 1.75x to\\n60.43x) and energy(up to 1.89x).\",\"PeriodicalId\":501168,\"journal\":{\"name\":\"arXiv - CS - Emerging Technologies\",\"volume\":\"2011 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Emerging Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.04812\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Emerging Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04812","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Collaborative PIM Computing Optimization Framework for Multi-Tenant DNN
Modern Artificial Intelligence (AI) applications are increasingly utilizing
multi-tenant deep neural networks (DNNs), which lead to a significant rise in
computing complexity and the need for computing parallelism. ReRAM-based
processing-in-memory (PIM) computing, with its high density and low power
consumption characteristics, holds promising potential for supporting the
deployment of multi-tenant DNNs. However, direct deployment of complex
multi-tenant DNNs on exsiting ReRAM-based PIM designs poses challenges.
Resource contention among different tenants can result in sever
under-utilization of on-chip computing resources. Moreover, area-intensive
operators and computation-intensive operators require excessively large on-chip
areas and long processing times, leading to high overall latency during
parallel computing. To address these challenges, we propose a novel ReRAM-based
in-memory computing framework that enables efficient deployment of multi-tenant
DNNs on ReRAM-based PIM designs. Our approach tackles the resource contention
problems by iteratively partitioning the PIM hardware at tenant level. In
addition, we construct a fine-grained reconstructed processing pipeline at the
operator level to handle area-intensive operators. Compared to the direct
deployments on traditional ReRAM-based PIM designs, our proposed PIM computing
framework achieves significant improvements in speed (ranges from 1.75x to
60.43x) and energy(up to 1.89x).