Integrating software caches with scratch pad memory

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2012-10-07 DOI:10.1145/2380403.2380440

Prasenjit Chakraborty, P. Panda

{"title":"Integrating software caches with scratch pad memory","authors":"Prasenjit Chakraborty, P. Panda","doi":"10.1145/2380403.2380440","DOIUrl":null,"url":null,"abstract":"Software cache refers to cache functionality emulated in software on a compiler-controlled Scratch Pad Memory (SPM). Such structures are useful when standard SPM allocation strategies cannot be used due to hard-to-analyze memory reference patterns in the source code. SPM data allocation strategies generally rely on compile-time inference of spatial and temporal reuse, with the general flow being the copying of a block/tile of array data into the SPM, followed by its processing, and finally, copying back. However, when array index functions are complicated due to conditionals, complex expressions, and dependence on run-time data, the SPM compiler has to rely on expensive DMA for individual words, leading to poor performance. Software caches (SWC) can play a crucial role in improving performance under such circumstances -- their access times are longer than those for direct SPM access, but they retain the advantages (present in hardware caches) of exploiting spatial and temporal locality discovered at run-time. We present the first automated compiler data allocation strategy that considers the presence of a software cache in SPM space, and makes decisions on which arrays should be accessed through it, at which times. Arrays could be accessed differently in different parts of a program, and our algorithm analyzes such uses and considers the possibility of selectively accessing an array through the SWC only when it is efficient, based on a cost model of the overheads involved in SPM/SWC transitions. We implemented our technique in an LLVM based framework and experimented with several applications on a Cell based machine. Our technique results in up to 82% overall performance improvement over a conventional SPM mapping algorithm and up to 27% over a typical SWC-enhanced implementation.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2380403.2380440","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Software cache refers to cache functionality emulated in software on a compiler-controlled Scratch Pad Memory (SPM). Such structures are useful when standard SPM allocation strategies cannot be used due to hard-to-analyze memory reference patterns in the source code. SPM data allocation strategies generally rely on compile-time inference of spatial and temporal reuse, with the general flow being the copying of a block/tile of array data into the SPM, followed by its processing, and finally, copying back. However, when array index functions are complicated due to conditionals, complex expressions, and dependence on run-time data, the SPM compiler has to rely on expensive DMA for individual words, leading to poor performance. Software caches (SWC) can play a crucial role in improving performance under such circumstances -- their access times are longer than those for direct SPM access, but they retain the advantages (present in hardware caches) of exploiting spatial and temporal locality discovered at run-time. We present the first automated compiler data allocation strategy that considers the presence of a software cache in SPM space, and makes decisions on which arrays should be accessed through it, at which times. Arrays could be accessed differently in different parts of a program, and our algorithm analyzes such uses and considers the possibility of selectively accessing an array through the SWC only when it is efficient, based on a cost model of the overheads involved in SPM/SWC transitions. We implemented our technique in an LLVM based framework and experimented with several applications on a Cell based machine. Our technique results in up to 82% overall performance improvement over a conventional SPM mapping algorithm and up to 27% over a typical SWC-enhanced implementation.

查看原文本刊更多论文

集成软件缓存与刮板存储器

软件缓存是指在一个编译器控制的暂存存储器(SPM)上用软件模拟的缓存功能。当由于难以分析源代码中的内存引用模式而无法使用标准SPM分配策略时，这种结构非常有用。SPM数据分配策略通常依赖于空间和时间重用的编译时推断，一般流程是将数组数据块/块复制到SPM中，然后对其进行处理，最后将其复制回来。但是，当数组索引函数由于条件、复杂表达式和对运行时数据的依赖而变得复杂时，SPM编译器必须对单个单词依赖昂贵的DMA，从而导致性能下降。在这种情况下，软件缓存(SWC)可以在提高性能方面发挥关键作用——它们的访问时间比直接SPM访问的时间长，但它们保留了在运行时发现的利用空间和时间局部性的优势(存在于硬件缓存中)。我们提出了第一个自动编译器数据分配策略，该策略考虑了SPM空间中软件缓存的存在，并决定应该在哪些时间通过它访问哪些数组。数组可以在程序的不同部分以不同的方式访问，我们的算法分析了这种使用，并考虑了仅在有效时通过SWC选择性访问数组的可能性，这是基于SPM/SWC转换所涉及的开销的成本模型。我们在基于LLVM的框架中实现了我们的技术，并在基于Cell的机器上对几个应用程序进行了实验。我们的技术比传统的SPM映射算法的总体性能提高了82%，比典型的swc增强实现的性能提高了27%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems

自引率

0.00%

发文量