{"title":"Locality Aware Memory Assignment and Tiling","authors":"Samuel Rogers, H. Tabkhi","doi":"10.1145/3195970.3196070","DOIUrl":null,"url":null,"abstract":"With the trend toward specialization, an efficient memory-path design is vital to capitalize customization in data-path. A monolithic memory hierarchy is often highly inefficient for irregular applications, traditionally targeted for CPUs. New approaches and tools are required to offer application-specific memory customization combining the benefits of cache and scratchpad memory simultaneously.This paper introduces a novel approach for automated application-specific on-chip memory assignment and tiling. The approach offers two major tools: (1) static memory access analysis and (2) variable-level memory assignment. Static memory analysis performs at the LLVM abstraction. It extracts target-independent pointer behaviors, measures the access strides and analyze the prefetchability of variables. (2) variable-level memory assignment creates a memory allocation graph for memory assignment (cache vs. scratchpad) based on the variables size and their estimated locality. It also explores the opportunity for tiling memory access. For the exploration and results, this paper uses Machsuite benchmarks (with both regular & irregular memory access behaviors), and gem5-Aladdin tool for performance & power evaluation. The proposed approach optimizes the memory hierarchy by automatically combining the benefits of cache, (tiled-) scratchpad at variable level granularity per individual applications. The results demonstrate more than 45% improvement in our power-stall product, on average, over the monolithic cache or scratchpad design.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3195970.3196070","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
With the trend toward specialization, an efficient memory-path design is vital to capitalize customization in data-path. A monolithic memory hierarchy is often highly inefficient for irregular applications, traditionally targeted for CPUs. New approaches and tools are required to offer application-specific memory customization combining the benefits of cache and scratchpad memory simultaneously.This paper introduces a novel approach for automated application-specific on-chip memory assignment and tiling. The approach offers two major tools: (1) static memory access analysis and (2) variable-level memory assignment. Static memory analysis performs at the LLVM abstraction. It extracts target-independent pointer behaviors, measures the access strides and analyze the prefetchability of variables. (2) variable-level memory assignment creates a memory allocation graph for memory assignment (cache vs. scratchpad) based on the variables size and their estimated locality. It also explores the opportunity for tiling memory access. For the exploration and results, this paper uses Machsuite benchmarks (with both regular & irregular memory access behaviors), and gem5-Aladdin tool for performance & power evaluation. The proposed approach optimizes the memory hierarchy by automatically combining the benefits of cache, (tiled-) scratchpad at variable level granularity per individual applications. The results demonstrate more than 45% improvement in our power-stall product, on average, over the monolithic cache or scratchpad design.
随着专门化趋势的发展,有效的内存路径设计对于实现数据路径的定制化至关重要。对于不规则的应用程序(传统上以cpu为目标),单片内存层次结构通常效率非常低。需要新的方法和工具来提供特定于应用程序的内存定制,同时结合缓存和暂存的优点。本文介绍了一种用于特定应用的片上内存自动分配和平铺的新方法。该方法提供了两个主要工具:(1)静态内存访问分析和(2)变量级内存分配。静态内存分析在LLVM抽象中执行。它提取了与目标无关的指针行为,测量了访问步幅并分析了变量的可预取性。(2)变量级内存分配创建内存分配图(cache vs. scratchpad)基于变量大小和它们的估计位置。它还探讨了平铺内存访问的可能性。为了探索和结果,本文使用Machsuite基准测试(包括规则和不规则的内存访问行为),并使用gem5-Aladdin工具进行性能和功耗评估。所提出的方法通过自动结合每个应用程序可变粒度的缓存(平铺)刮擦板的优点来优化内存层次结构。结果表明,与单片缓存或刮擦板设计相比,我们的产品在电源失速方面平均改善了45%以上。