2014 23rd International Conference on Parallel Architecture and Compilation (PACT)最新文献_第2页

Protection and utilization in shared cache through rationing 通过配给来保护和利用共享缓存

2014 23rd International Conference on Parallel Architecture and Compilation (PACT) Pub Date : 2014-08-24 DOI: 10.1145/2628071.2628120

Raj Parihar, Jacob Brock, C. Ding, Michael C. Huang

引用次数: 4

Bitwise data parallelism in regular expression matching 正则表达式匹配中的按位数据并行性

2014 23rd International Conference on Parallel Architecture and Compilation (PACT) Pub Date : 2014-08-24 DOI: 10.1145/2628071.2628079

R. Cameron, T. Shermer, Arrvindh Shriraman, Kenneth S. Herdy, Dan Lin, Benjamin R. Hull, Meng Lin

引用次数: 25

ArrayTool: A lightweight profiler to guide array regrouping ArrayTool:一个用于指导数组重组的轻量级分析器

2014 23rd International Conference on Parallel Architecture and Compilation (PACT) Pub Date : 2014-08-24 DOI: 10.1145/2628071.2628102

Xu Liu, Kamal Sharma, J. Mellor-Crummey

{"title":"ArrayTool: A lightweight profiler to guide array regrouping","authors":"Xu Liu, Kamal Sharma, J. Mellor-Crummey","doi":"10.1145/2628071.2628102","DOIUrl":"https://doi.org/10.1145/2628071.2628102","url":null,"abstract":"Memory hierarchies in modern computer systems are complex; often, they include multi-level caches and multiple memory controllers on the same chip. Without careful design, programs suffer from unnecessary data movement between caches and memory, degrading performance and increasing energy consumption. Array regrouping can significantly improve data locality by improving spatial reuse of data and reducing cache contention. However, existing techniques for identifying opportunities for array regrouping are lacking in three ways. First, they provide inadequate information to guide regrouping. Second, the cost of monitoring employed by prior tools to identify regrouping opportunities limits the use of these methods in practice. Third, existing metrics for quantifying the benefits of array regrouping can lead to inappropriate transformations that hurt performance. In this paper, we describe ArrayTool — a lightweight profiler that guides array regrouping. Array-Tool has three unique capabilities. First, it focuses attention on arrays with significant access latency. Second, it identifies the feasibility and quantifies the benefits of regrouping arrays with lightweight array-centric profiling. Third, it works on both shared-memory and distributed-memory parallel programs. To illustrate the utility of ArrayTool, we employ it to analyze three benchmarks. Using the guidance it provides, we regroup program arrays, improving performance from 25% to a factor of two.","PeriodicalId":263670,"journal":{"name":"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128079433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Velociraptor: An embedded compiler toolkit for numerical programs targeting CPUs and GPUs Velociraptor:针对cpu和gpu的数值程序的嵌入式编译器工具包

2014 23rd International Conference on Parallel Architecture and Compilation (PACT) Pub Date : 2014-08-24 DOI: 10.1145/2628071.2628097

R. Garg, L. Hendren

引用次数: 8

Optimizing stencil code via locality of computation 通过计算局部性优化模板代码

2014 23rd International Conference on Parallel Architecture and Compilation (PACT) Pub Date : 2014-08-24 DOI: 10.1145/2628071.2628121

Yulong Luo, Guangming Tan

引用次数: 6

A runtime support mechanism for fast mode switching of a self-morphing core for power efficiency 一种运行时支持机制，用于自变形核心的快速模式切换，以提高功率效率

2014 23rd International Conference on Parallel Architecture and Compilation (PACT) Pub Date : 2014-08-24 DOI: 10.1145/2628071.2628124

S. Srinivasan, Nithesh kurella, I. Koren, Rance Rodrigues, S. Kundu

{"title":"A runtime support mechanism for fast mode switching of a self-morphing core for power efficiency","authors":"S. Srinivasan, Nithesh kurella, I. Koren, Rance Rodrigues, S. Kundu","doi":"10.1145/2628071.2628124","DOIUrl":"https://doi.org/10.1145/2628071.2628124","url":null,"abstract":"Asymmetric multicore processors (AMPs) consist of cores executing the same ISA, but differing in microarchitectural resources, performance, and power consumption. As the computational bottleneck of a workload shifts from one resource to the next, during its course of execution, reassigning it to the core where it runs most efficiently can improve the overall energy efficiency. Simulation studies show that the performance bottlenecks can shift frequently, often within a few thousands cycles. With frequent core hooping, the overhead of thread migration becomes significant. To mitigate this overhead, we propose a morphable core that can assume one of four possible configurations to address the dominant performance bottlenecks, while retaining the same cache and registers. This way the architectural state remains intact while the morphable core is reconfigured in resources and frequency. We then implement a runtime scheme to decide the best configuration to run on and switch configuration as necessary. Simulation results indicate that on the average, the proposed scheme results in performance/watt improvement of 41%.","PeriodicalId":263670,"journal":{"name":"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115392976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A run-time power manager exploiting software parallelism 利用软件并行性的运行时电源管理器

2014 23rd International Conference on Parallel Architecture and Compilation (PACT) Pub Date : 2014-08-24 DOI: 10.1145/2628071.2628116

Simon Holmbacka, S. Lafond, J. Lilius

引用次数: 1

PATS: Pattern aware scheduling and power gating for GPGPUs PATS:用于gpgpu的模式感知调度和功率门控

2014 23rd International Conference on Parallel Architecture and Compilation (PACT) Pub Date : 2014-08-24 DOI: 10.1145/2628071.2628105

Qiumin Xu, M. Annavaram

{"title":"PATS: Pattern aware scheduling and power gating for GPGPUs","authors":"Qiumin Xu, M. Annavaram","doi":"10.1145/2628071.2628105","DOIUrl":"https://doi.org/10.1145/2628071.2628105","url":null,"abstract":"General purpose computing using graphics processing units (GPGPUs) is an attractive option to achieve power efficient throughput computing. But the power efficiency of GPGPUs can be significantly curtailed in the presence of divergence. This paper evaluates two important facets of this problem. First, we study the branch divergence behavior of various GPGPU workloads. We show that only a few branch divergence patterns are dominant in most workloads. In fact only five branch divergence patterns account for 60% of all the divergent instructions in our workloads. In the second part of this work we exploit this branch divergence pattern bias to propose a new divergence pattern aware warp scheduler, called PATS. PATS prioritizes scheduling warps with the same divergence pattern so as to create long idleness windows for any given execution lane. The long idleness windows are then exploited for efficiently power gating the unused lanes while amortizing the gating overhead. We describe the architectural implementation details of PATS and evaluate the power and performance impact of PATS. Our proposed design significantly improves power gating efficiency of GPGPUs with minimal performance overhead.","PeriodicalId":263670,"journal":{"name":"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129784376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 44

Active learning accelerated automatic heuristic construction for parallel program mapping

2014 23rd International Conference on Parallel Architecture and Compilation (PACT) Pub Date : 2014-08-24 DOI: 10.1145/2628071.2628128

William F. Ogilvie, Pavlos Petoumenos, Z. Wang, Hugh Leather

{"title":"Active learning accelerated automatic heuristic construction for parallel program mapping","authors":"William F. Ogilvie, Pavlos Petoumenos, Z. Wang, Hugh Leather","doi":"10.1145/2628071.2628128","DOIUrl":"https://doi.org/10.1145/2628071.2628128","url":null,"abstract":"Building effective optimization heuristics is a challenging task which often takes developers several months if not years to complete. Predictive modelling has recently emerged as a promising solution, automatically constructing heuristics from training data, however, obtaining this data can take months per platform. This is becoming an ever more critical problem as the pace of change in architecture increases. Indeed, if no solution is found we shall be left with out of date heuristics which cannot extract the best performance from modern machines. In this work, we present a low-cost predictive modelling approach for automatic heuristic construction which significantly reduces this training overhead. Typically in supervised learning the training instances are randomly selected to evaluate regardless of how much useful information they carry, but this wastes effort on parts of the space that contribute little to the quality of the produced heuristic. Our approach, on the other hand, uses active learning to select and only focus on the most useful training examples and thus reduces the training overhead. We demonstrate this technique by automatically creating a model to determine on which device to execute four parallel programs at differing problem dimensions for a representative Cpu-Gpu based system. Our methodology is remarkably simple and yet effective, making it a strong candidate for wide adoption. At high levels of classification accuracy the average learning speed-up is 3×, as compared to the state-of-the-art.","PeriodicalId":263670,"journal":{"name":"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130956207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Rollback-free value prediction with approximate loads 具有近似负载的无回滚值预测

2014 23rd International Conference on Parallel Architecture and Compilation (PACT) Pub Date : 2014-08-24 DOI: 10.1145/2628071.2628110

Bradley Thwaites, Gennady Pekhimenko, H. Esmaeilzadeh, A. Yazdanbakhsh, O. Mutlu, Jongse Park, Girish Mururu, T. Mowry

引用次数: 56