Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004.最新文献

Fast indexing for blocked array layouts to improve multi-level cache locality 快速索引阻塞数组布局，以提高多级缓存局部性

Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004. Pub Date : 2004-05-24 DOI: 10.1109/INTERA.2004.1299515

Evangelia Athanasaki, N. Koziris

{"title":"Fast indexing for blocked array layouts to improve multi-level cache locality","authors":"Evangelia Athanasaki, N. Koziris","doi":"10.1109/INTERA.2004.1299515","DOIUrl":"https://doi.org/10.1109/INTERA.2004.1299515","url":null,"abstract":"One of the key challenges computer architects and compiler writers are facing, is the increasing discrepancy between processor cycle times and main memory access times. To overcome this problem, program transformations that decrease cache misses are used, to reduce average latency for memory accesses. Tiling is a widely used loop iteration reordering technique for improving locality of references. In this paper, we further reduce cache misses, restructuring the memory layout of multi-dimensional arrays, that are accessed by tiled instruction code. In our method, array elements are stored in a blocked way, exactly as they are swept by the tiled instruction stream. We present a straightforward way to easily translate multi-dimensional indexing of arrays into their blocked memory layout using simple binary-mask operations. Indices for such array layouts are now easily calculated based on the algebra of dilated integers, similarly to morton-order indexing. Actual experimental results on three different hardware platforms, using 5 benchmarks, illustrate that execution time is greatly improved when combining tiled code with tiled array layouts and binary mask-based index translation functions. Both TLB and L1 cache misses are concurrently minimized, for the same tile size, thus, applying the proposed layouts, locality of references is greatly improved. Finally, simulations using the Simplescalar tool, verify that our enhanced performance is due to the considerable reduction of cache misses in all levels of memory hierarchy.","PeriodicalId":262940,"journal":{"name":"Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004.","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122941449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Garbage collector refinement for new dynamic multimedia applications on embedded systems 嵌入式系统上新的动态多媒体应用的垃圾收集器改进

Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004. Pub Date : 2004-05-24 DOI: 10.1109/INTERA.2004.1299507

J. M. Velasco, David Atienza Alonso, F. Catthoor, F. Tirado, Katzalin Olcoz, J. Mendias

{"title":"Garbage collector refinement for new dynamic multimedia applications on embedded systems","authors":"J. M. Velasco, David Atienza Alonso, F. Catthoor, F. Tirado, Katzalin Olcoz, J. Mendias","doi":"10.1109/INTERA.2004.1299507","DOIUrl":"https://doi.org/10.1109/INTERA.2004.1299507","url":null,"abstract":"Consumer embedded devices must execute concurrently multiple services (e.g. multimedia applications) that are dynamically triggered by the user. For these new embedded multimedia applications, the dynamic memory subsystem is currently one of the main sources of power consumption and its inattentive management can severely affect the performance and power consumption and its attentive management can severely affect the performance and power consumption of the whole system. Therefore, the use of suitable automatic mechanisms to reuse the dynamic computer storage (i.e. garbage collector mechanisms) taking into account the underlying embedded devices would allow the designers to more efficiently design these systems. However, methodologies to explore and implement convenient garbage collector mechanisms for embedded devices have not been developed yet. In this paper we propose a system-level method to define and explore the vast design space of possible garbage collector mechanisms, which enables to define custom garbage collector implementations for the final embedded devices.","PeriodicalId":262940,"journal":{"name":"Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004.","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115062361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SimSnap: fast-forwarding via native execution and application-level checkpointing SimSnap:通过本机执行和应用程序级检查点快速转发

Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004. Pub Date : 2004-05-24 DOI: 10.1109/INTERA.2004.1299511

P. Szwed, Daniel Marques, Robert M. Buels, S. Mckee, M. Schulz

引用次数: 25

Cool-Fetch: a compiler-enabled IPC estimation based framework for energy reduction Cool-Fetch:一个基于编译器的IPC估计框架，用于降低能耗

Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004. Pub Date : 2004-05-24 DOI: 10.1109/INTERA.2004.1299509

O. Unsal, I. Koren, C. M. Krishna, C. A. Moritz

{"title":"Cool-Fetch: a compiler-enabled IPC estimation based framework for energy reduction","authors":"O. Unsal, I. Koren, C. M. Krishna, C. A. Moritz","doi":"10.1109/INTERA.2004.1299509","DOIUrl":"https://doi.org/10.1109/INTERA.2004.1299509","url":null,"abstract":"With power consumption becoming an increasingly important factor, it is necessary to reevaluate traditional, power-intensive, architectural techniques and their relative performance benefits. We believe that combined architecture-compiler efforts open up new and efficient ways to retain the performance benefits of modern architectures while addressing their power impact. In this paper, we present Cool-Fetch, an architecture compiler based approach to reduce energy consumption in the processor. While we mainly target the fetch unit, an important side-effect of our approach is that we obtain energy savings in many other parts of the processor. The explanation is that the fetch unit often runs substantially ahead of execution, bringing in instructions to different stages in the processor that may never be executed. We have found that although the degree of instruction level parallelism (ILP) of a program tends to vary over time, it can be statically estimated by the compiler. Our instructions per clock (IPC) estimation scheme uses monotonic dataflow analysis and simple heuristics, to guide a fetch-throttling mechanism. We develop the necessary architecture support and include its power overhead. Using Mediabench and SPEC2000 applications, we obtain up to 15% total energy savings in the processor with generally little performance degradation. We also provide a comparison of Cool-Fetch with previously proposed hardware-only dynamic fetch-throttling schemes.","PeriodicalId":262940,"journal":{"name":"Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004.","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131325722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Data movement optimization for software-controlled on-chip memory 软件控制的片上存储器的数据移动优化

Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004. Pub Date : 2004-05-24 DOI: 10.1109/INTERA.2004.1299516

M. Fujita, Masaaki Kondo, Hiroshi Nakamura

引用次数: 2

Exploitation of instruction-level parallelism for optimal loop scheduling 指令级并行性在优化循环调度中的应用

Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004. Pub Date : 2004-05-24 DOI: 10.1109/INTERA.2004.1299506

Jan Müller, D. Fimmel, R. Merker

引用次数: 3

Link-time optimization techniques for eliminating conditional branch redundancies 消除条件分支冗余的链路时间优化技术

Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004. Pub Date : 2004-05-24 DOI: 10.1109/INTERA.2004.1299513

Manel Fernández, R. Espasa

引用次数: 2

Reducing fetch architecture complexity using procedure inlining 使用过程内联降低获取体系结构的复杂性

Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004. Pub Date : 2004-05-24 DOI: 10.1109/INTERA.2004.1299514

Oliverio J. Santana, Alex Ramírez, M. Valero

{"title":"Reducing fetch architecture complexity using procedure inlining","authors":"Oliverio J. Santana, Alex Ramírez, M. Valero","doi":"10.1109/INTERA.2004.1299514","DOIUrl":"https://doi.org/10.1109/INTERA.2004.1299514","url":null,"abstract":"Fetch engine performance is seriously limited by the branch prediction table access latency. This fact has lead to the development of hardware mechanisms, like prediction overriding, aimed to tolerate this latency. However, prediction overriding requires additional support and recovery mechanisms, which increases the fetch architecture complexity. In this paper, we show that this increase in complexity can be avoided if the interaction between the fetch architecture and software code optimizations is taken into account. We use aggressive procedure inlining to generate long streams of instructions that are used by the fetch engine as the basic prediction unit. We call instruction stream to a sequence of instructions from the target of a taken branch to the next taken branch. These instruction streams are long enough to feed the execution engine with instructions during multiple cycles, while a new stream prediction is being generated, and thus hiding the prediction table access latency. Our results show that the length of instruction streams compensates the increase in the instruction cache miss rate caused by inlining. We show that, using procedure inlining, the need for a prediction overriding mechanism is avoided, reducing the fetch engine complexity.","PeriodicalId":262940,"journal":{"name":"Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004.","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126391533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Dynamic management of nursery space organization in generational collection 代际收集中苗圃空间组织的动态管理

Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004. Pub Date : 2004-05-24 DOI: 10.1109/INTERA.2004.1299508

J. M. Velasco, A. Ortiz, Katzalin Olcoz, F. Tirado

引用次数: 6

Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems 嵌入式系统中基于相位的缓存调整方案的能效潜力

Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004. Pub Date : 2004-05-24 DOI: 10.1109/INTERA.2004.1299510

Gilles A. Pokam, F. Bodin

{"title":"Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems","authors":"Gilles A. Pokam, F. Bodin","doi":"10.1109/INTERA.2004.1299510","DOIUrl":"https://doi.org/10.1109/INTERA.2004.1299510","url":null,"abstract":"Managing the energy-performance tradeoff has become a major challenge with embedded systems. The cache hierarchy is a typical example where this tradeoff plays a central role. With the increasing level of integration density, a cache can feature millions of transistors, consuming a significant portion of the energy. At the same time however, a cache also permits to significantly improve performance. Configurable caches are becoming \"de-facto\" solution to deal efficiently with these issues. Such caches are equipped with artifacts that enable one size to resize it dynamically. With regard to embedded systems, however, many of these artifacts restrict the configurability at the application level. We propose in this paper to modify the structure of a configurable cache to offer embedded compilers the opportunity to reconfigure it according to a program dynamic phase, rather than on a per-application basis. We show in our experimental results that the proposed scheme has a potential for improving the compiler effectiveness to reduce the energy consumption, while not excessively degrading the performance.","PeriodicalId":262940,"journal":{"name":"Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004.","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134154119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10