{"title":"GAROS: Genetic algorithm-aided row-skipping for shift and duplicate kernel mapping in processing-in-memory architectures","authors":"Johnny Rhe , Kang Eun Jeon , Jong Hwan Ko","doi":"10.1016/j.sysarc.2025.103423","DOIUrl":null,"url":null,"abstract":"<div><div>Processing-in-memory (PIM) architecture is becoming a promising candidate for convolutional neural network (CNN) inference. A recent mapping method, shift and duplicate kernel (SDK), enhances latency by improving array utilization through shifting the same kernels into idle columns. Although pattern-based pruning effectively enables row-skipping, traditional pattern designs are suboptimal for SDK mapping due to the irregular kernel shifts, complicating row-skipping. To address this, we proposed pruning-aided row-skipping (PAIRS), which adopts SDK-optimized layer-wise patterns. However, PAIRS has two key limitations: it offers discrete row-skipping by using single pattern set, restricting precise control over the weight matrix compression for varying layer and array sizes, and it risks accuracy loss by pruning critical weights. To overcome these challenges, we introduce genetic algorithm-aided row-skipping (GAROS), which employs input channel (IC)-wise patterns. GAROS enables finer control over row-skipping by assigning several pattern sets and selecting optimal patterns to each IC for preserving critical weights. Consequently, this approach enables continuous weight matrix compression while balancing the trade-off between row-skipping and accuracy. Simulation results in WRN16-4 demonstrate that GAROS improved accuracy by up to +2.4% compared to PAIRS and achieved up to a 1.74<span><math><mo>×</mo></math></span> speedup compared to baseline when 128 × 128 sub-array is used.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"165 ","pages":"Article 103423"},"PeriodicalIF":3.7000,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762125000955","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Processing-in-memory (PIM) architecture is becoming a promising candidate for convolutional neural network (CNN) inference. A recent mapping method, shift and duplicate kernel (SDK), enhances latency by improving array utilization through shifting the same kernels into idle columns. Although pattern-based pruning effectively enables row-skipping, traditional pattern designs are suboptimal for SDK mapping due to the irregular kernel shifts, complicating row-skipping. To address this, we proposed pruning-aided row-skipping (PAIRS), which adopts SDK-optimized layer-wise patterns. However, PAIRS has two key limitations: it offers discrete row-skipping by using single pattern set, restricting precise control over the weight matrix compression for varying layer and array sizes, and it risks accuracy loss by pruning critical weights. To overcome these challenges, we introduce genetic algorithm-aided row-skipping (GAROS), which employs input channel (IC)-wise patterns. GAROS enables finer control over row-skipping by assigning several pattern sets and selecting optimal patterns to each IC for preserving critical weights. Consequently, this approach enables continuous weight matrix compression while balancing the trade-off between row-skipping and accuracy. Simulation results in WRN16-4 demonstrate that GAROS improved accuracy by up to +2.4% compared to PAIRS and achieved up to a 1.74 speedup compared to baseline when 128 × 128 sub-array is used.
期刊介绍:
The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software.
Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.