Compiler managed dynamic instruction placement in a low-power code cache

Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization Pub Date : 2005-03-20 DOI:10.1109/CGO.2005.13

Rajiv A. Ravindran, Pracheeti D. Nagarkar, Ganesh S. Dasika, E. Marsman, R. Senger, S. Mahlke, Richard B. Brown

{"title":"Compiler managed dynamic instruction placement in a low-power code cache","authors":"Rajiv A. Ravindran, Pracheeti D. Nagarkar, Ganesh S. Dasika, E. Marsman, R. Senger, S. Mahlke, Richard B. Brown","doi":"10.1109/CGO.2005.13","DOIUrl":null,"url":null,"abstract":"Modern embedded microprocessors use low power on-chip memories called scratch-pad memories to store frequently executed instructions and data. Unlike traditional caches, scratch-pad memories lack the complex tag checking and comparison logic, thereby proving to be efficient in area and power. In this work, we focus on exploiting scratch-pad memories for storing hot code segments within an application. Static placement techniques focus on placing the most frequently executed portions of programs into the scratch-pad. However, static schemes are inherently limited by not allowing the contents of the scratch-pad memory to change at run time. In a large fraction of applications, the instruction memory footprints exceed the scratch-pad memory size, thereby limiting the usefulness of the scratch-pad. We propose a compiler managed dynamic placement algorithm, wherein multiple hot code sequences, or traces, are overlapped with each other in the scratch-pad memory at different points in time during execution. Special copy instructions are provided to copy the traces into the scratch-pad memory at run-time. Using a power estimate, the compiler initially selects the most frequent traces in an application for relocation into the scratch-pad memory. Through iterative code motion and redundancy elimination, copy instructions are inserted in infrequently executed regions of the code. For a 64-byte code cache, the compiler managed dynamic placement achieves an average of 64% energy improvement over the static solution in a low-power embedded microcontroller.","PeriodicalId":92120,"journal":{"name":"Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization","volume":"3 1","pages":"179-190"},"PeriodicalIF":0.0000,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"53","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CGO.2005.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 53

Abstract

Modern embedded microprocessors use low power on-chip memories called scratch-pad memories to store frequently executed instructions and data. Unlike traditional caches, scratch-pad memories lack the complex tag checking and comparison logic, thereby proving to be efficient in area and power. In this work, we focus on exploiting scratch-pad memories for storing hot code segments within an application. Static placement techniques focus on placing the most frequently executed portions of programs into the scratch-pad. However, static schemes are inherently limited by not allowing the contents of the scratch-pad memory to change at run time. In a large fraction of applications, the instruction memory footprints exceed the scratch-pad memory size, thereby limiting the usefulness of the scratch-pad. We propose a compiler managed dynamic placement algorithm, wherein multiple hot code sequences, or traces, are overlapped with each other in the scratch-pad memory at different points in time during execution. Special copy instructions are provided to copy the traces into the scratch-pad memory at run-time. Using a power estimate, the compiler initially selects the most frequent traces in an application for relocation into the scratch-pad memory. Through iterative code motion and redundancy elimination, copy instructions are inserted in infrequently executed regions of the code. For a 64-byte code cache, the compiler managed dynamic placement achieves an average of 64% energy improvement over the static solution in a low-power embedded microcontroller.

查看原文本刊更多论文

编译器在低功耗代码缓存中管理动态指令放置

现代嵌入式微处理器使用低功耗的片上存储器来存储频繁执行的指令和数据。与传统的高速缓存不同，刮刮板存储器缺乏复杂的标签检查和比较逻辑，因此在面积和功率方面被证明是高效的。在这项工作中，我们专注于利用刮擦板存储器来存储应用程序中的热代码段。静态放置技术的重点是将程序中最频繁执行的部分放置到刮擦板中。然而，静态模式本身就受到限制，不允许在运行时更改临时存储器的内容。在很大一部分应用程序中，指令内存占用超过了刮记板内存大小，从而限制了刮记板的有用性。我们提出了一种编译器管理的动态放置算法，其中多个热代码序列或轨迹在执行期间的不同时间点在刮刮板存储器中相互重叠。提供了特殊的复制指令，以便在运行时将跟踪复制到临时存储器中。使用功率估计，编译器首先选择应用程序中最频繁的跟踪，将其重定位到临时存储器中。通过迭代的代码运动和冗余消除，复制指令被插入到不经常执行的代码区域。对于64字节的代码缓存，在低功耗嵌入式微控制器中，编译器管理的动态放置比静态解决方案平均节省64%的能量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization

自引率

0.00%

发文量