Dynamic percolation: a case of study on the shortcomings of traditional optimization in many-core architectures

ACM International Conference on Computing Frontiers Pub Date : 2012-05-15 DOI:10.1145/2212908.2212944

E. Garcia, Daniel A. Orozco, R. Khan, Ioannis E. Venetis, Kelly Livingston, G. Gao

引用次数: 14

Abstract

This paper provides a discussion on the shortcomings of traditional static optimization techniques when used in the context of many-core architectures. We argue that these shortcomings are a result of the significantly different environment found in many-cores. We analyze previous attempts at optimization of Dense Matrix Multiplication (DMM) that failed to achieve high performance despite extensive efforts towards optimization. We have found that percolation (prefetching data) and scheduling play a central role in the performance of applications. To overcome those difficulties, we have (1) fused dynamic scheduling and percolation into a dynamic percolation approach and (2) we have added additional percolation operations. Our new techniques enabled us to increase the performance of the application in our study from 44 GFLOPS (out of 80 GFLOPS possible) to 70.0 GFLOPS (operands in SRAM) or 65.6 GFLOPS (operands in DRAM).

查看原文本刊更多论文

动态渗透:研究传统优化在多核体系结构中的不足

本文讨论了传统静态优化技术在多核架构环境下的缺点。我们认为这些缺点是在多核中发现的显着不同的环境的结果。我们分析了之前在密集矩阵乘法(DMM)优化方面的尝试，尽管在优化方面做了大量的努力，但未能实现高性能。我们发现，渗透(预取数据)和调度在应用程序的性能中起着核心作用。为了克服这些困难，我们(1)将动态调度和渗透融合到一个动态渗透方法中，(2)我们增加了额外的渗透操作。我们的新技术使我们能够将我们研究中的应用程序的性能从44 GFLOPS(可能的80 GFLOPS)提高到70.0 GFLOPS (SRAM中的操作数)或65.6 GFLOPS (DRAM中的操作数)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM International Conference on Computing Frontiers

自引率

0.00%

发文量