{"title":"Pack my weights and run! Minimizing overheads for in-memory computing accelerators","authors":"Pouya Houshmand, Marian Verhelst","doi":"arxiv-2409.11437","DOIUrl":null,"url":null,"abstract":"In-memory computing hardware accelerators allow more than 10x improvements in\npeak efficiency and performance for matrix-vector multiplications (MVM)\ncompared to conventional digital designs. For this, they have gained great\ninterest for the acceleration of neural network workloads. Nevertheless, these\npotential gains are only achieved when the utilization of the computational\nresources is maximized and the overhead from loading operands in the memory\narray minimized. To this aim, this paper proposes a novel mapping algorithm for\nthe weights in the IMC macro, based on efficient packing of the weights of\nnetwork layers in the available memory. The algorithm realizes 1) minimization\nof weight loading times while at the same time 2) maximally exploiting the\nparallelism of the IMC computational fabric. A set of case studies are carried\nout to show achievable trade-offs for the MLPerf Tiny benchmark\n\\cite{mlperftiny} on IMC architectures, with potential $10-100\\times$ EDP\nimprovements.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Image and Video Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11437","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In-memory computing hardware accelerators allow more than 10x improvements in
peak efficiency and performance for matrix-vector multiplications (MVM)
compared to conventional digital designs. For this, they have gained great
interest for the acceleration of neural network workloads. Nevertheless, these
potential gains are only achieved when the utilization of the computational
resources is maximized and the overhead from loading operands in the memory
array minimized. To this aim, this paper proposes a novel mapping algorithm for
the weights in the IMC macro, based on efficient packing of the weights of
network layers in the available memory. The algorithm realizes 1) minimization
of weight loading times while at the same time 2) maximally exploiting the
parallelism of the IMC computational fabric. A set of case studies are carried
out to show achievable trade-offs for the MLPerf Tiny benchmark
\cite{mlperftiny} on IMC architectures, with potential $10-100\times$ EDP
improvements.