2014 23rd International Conference on Parallel Architecture and Compilation (PACT)最新文献

筛选
英文 中文
Protection and utilization in shared cache through rationing 通过配给来保护和利用共享缓存
Raj Parihar, Jacob Brock, C. Ding, Michael C. Huang
{"title":"Protection and utilization in shared cache through rationing","authors":"Raj Parihar, Jacob Brock, C. Ding, Michael C. Huang","doi":"10.1145/2628071.2628120","DOIUrl":"https://doi.org/10.1145/2628071.2628120","url":null,"abstract":"Shared cache is generally optimized for overall throughput, fairness, or both. Increasingly in shared environments, e.g., compute clouds, users are unrelated to one another. In such circumstances, an overall gain in throughput does not justify an individual loss. This paper explores a new strategy for conservative sharing, which protects the cache occupancy for individual programs, but still enables full cache sharing whenever there is unused space.","PeriodicalId":263670,"journal":{"name":"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)","volume":"25 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130652581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Bitwise data parallelism in regular expression matching 正则表达式匹配中的按位数据并行性
R. Cameron, T. Shermer, Arrvindh Shriraman, Kenneth S. Herdy, Dan Lin, Benjamin R. Hull, Meng Lin
{"title":"Bitwise data parallelism in regular expression matching","authors":"R. Cameron, T. Shermer, Arrvindh Shriraman, Kenneth S. Herdy, Dan Lin, Benjamin R. Hull, Meng Lin","doi":"10.1145/2628071.2628079","DOIUrl":"https://doi.org/10.1145/2628071.2628079","url":null,"abstract":"A new parallel algorithm for regular expression matching is developed and applied to the classical grep (global regular expression print) problem. Building on the bitwise data parallelism previously applied to the manual implementation of token scanning in the Parabix XML parser, the new algorithm represents a general solution to the problem of regular expression matching using parallel bit streams. On widely-deployed commodity hardware using 128-bit SSE2 SIMD technology, our algorithm implementations can substantially outperform traditional grep implementations based on NFAs, DFAs or backtracking. 5× or better performance advantage against the best of available competitors is not atypical. The algorithms are also designed to scale with the availability of additional parallel resources such as the wider SIMD facilities (256-bit) of Intel AVX2 or future 512bit extensions. Our AVX2 implementation showed dramatic reduction in instruction count and significant improvement in speed. Our GPU implementations show further acceleration.","PeriodicalId":263670,"journal":{"name":"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115151553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
ArrayTool: A lightweight profiler to guide array regrouping ArrayTool:一个用于指导数组重组的轻量级分析器
Xu Liu, Kamal Sharma, J. Mellor-Crummey
{"title":"ArrayTool: A lightweight profiler to guide array regrouping","authors":"Xu Liu, Kamal Sharma, J. Mellor-Crummey","doi":"10.1145/2628071.2628102","DOIUrl":"https://doi.org/10.1145/2628071.2628102","url":null,"abstract":"Memory hierarchies in modern computer systems are complex; often, they include multi-level caches and multiple memory controllers on the same chip. Without careful design, programs suffer from unnecessary data movement between caches and memory, degrading performance and increasing energy consumption. Array regrouping can significantly improve data locality by improving spatial reuse of data and reducing cache contention. However, existing techniques for identifying opportunities for array regrouping are lacking in three ways. First, they provide inadequate information to guide regrouping. Second, the cost of monitoring employed by prior tools to identify regrouping opportunities limits the use of these methods in practice. Third, existing metrics for quantifying the benefits of array regrouping can lead to inappropriate transformations that hurt performance. In this paper, we describe ArrayTool — a lightweight profiler that guides array regrouping. Array-Tool has three unique capabilities. First, it focuses attention on arrays with significant access latency. Second, it identifies the feasibility and quantifies the benefits of regrouping arrays with lightweight array-centric profiling. Third, it works on both shared-memory and distributed-memory parallel programs. To illustrate the utility of ArrayTool, we employ it to analyze three benchmarks. Using the guidance it provides, we regroup program arrays, improving performance from 25% to a factor of two.","PeriodicalId":263670,"journal":{"name":"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128079433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Velociraptor: An embedded compiler toolkit for numerical programs targeting CPUs and GPUs Velociraptor:针对cpu和gpu的数值程序的嵌入式编译器工具包
R. Garg, L. Hendren
{"title":"Velociraptor: An embedded compiler toolkit for numerical programs targeting CPUs and GPUs","authors":"R. Garg, L. Hendren","doi":"10.1145/2628071.2628097","DOIUrl":"https://doi.org/10.1145/2628071.2628097","url":null,"abstract":"Developing just-in-time (JIT) compilers that that allow scientific programmers to efficiently target both CPUs and GPUs is of increasing interest. However building such compilers requires considerable effort. We present a reusable and embeddable compiler toolkit called Velociraptor that can be used to easily build compilers for numerical programs targeting multicores and GPUs. Velociraptor provides a new high-level IR called VRIR which has been specifically designed for numeric computations, with rich support for arrays, plus support for highlevel parallel and GPU constructs. A compiler developer uses Velociraptor by generating VRIR for key parts of an input program. Velociraptor provides an optimizing compiler toolkit for generating CPU and GPU code and also provides a smart runtime system to manage the GPU. To demonstrate Velociraptor in action, we present two proof-of-concept case studies: a GPU extension for a JIT implementation of MATLAB language, and a JIT compiler for Python targeting CPUs and GPUs.","PeriodicalId":263670,"journal":{"name":"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132534665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Optimizing stencil code via locality of computation 通过计算局部性优化模板代码
Yulong Luo, Guangming Tan
{"title":"Optimizing stencil code via locality of computation","authors":"Yulong Luo, Guangming Tan","doi":"10.1145/2628071.2628121","DOIUrl":"https://doi.org/10.1145/2628071.2628121","url":null,"abstract":"Stencil computation is a performance critical kernel used in scientific and engineering applications. We define a term of locality of computation to guide stencil optimization by either architecture or compiler. Being analogous to locality of reference, computational behavior is also classified into spatial locality and temporal locality. This paper develops equivalent computation elimination (ECE) approach in multi-level loop for exploiting temporal locality of computation. The strength of ECE lies on an intermediate-based searching algorithm to eliminate inter-iteration computational redundancies of all possible combination and a multiple dimensions replacement algorithm to replace redundant computation across loops of multiple dimensions. We implemented ECE in ROSE compiler infrastructure. The experiment shows that ECE improves performance by 20% on average due to the consciousness of temporal locality.","PeriodicalId":263670,"journal":{"name":"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114303626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A runtime support mechanism for fast mode switching of a self-morphing core for power efficiency 一种运行时支持机制,用于自变形核心的快速模式切换,以提高功率效率
S. Srinivasan, Nithesh kurella, I. Koren, Rance Rodrigues, S. Kundu
{"title":"A runtime support mechanism for fast mode switching of a self-morphing core for power efficiency","authors":"S. Srinivasan, Nithesh kurella, I. Koren, Rance Rodrigues, S. Kundu","doi":"10.1145/2628071.2628124","DOIUrl":"https://doi.org/10.1145/2628071.2628124","url":null,"abstract":"Asymmetric multicore processors (AMPs) consist of cores executing the same ISA, but differing in microarchitectural resources, performance, and power consumption. As the computational bottleneck of a workload shifts from one resource to the next, during its course of execution, reassigning it to the core where it runs most efficiently can improve the overall energy efficiency. Simulation studies show that the performance bottlenecks can shift frequently, often within a few thousands cycles. With frequent core hooping, the overhead of thread migration becomes significant. To mitigate this overhead, we propose a morphable core that can assume one of four possible configurations to address the dominant performance bottlenecks, while retaining the same cache and registers. This way the architectural state remains intact while the morphable core is reconfigured in resources and frequency. We then implement a runtime scheme to decide the best configuration to run on and switch configuration as necessary. Simulation results indicate that on the average, the proposed scheme results in performance/watt improvement of 41%.","PeriodicalId":263670,"journal":{"name":"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115392976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A run-time power manager exploiting software parallelism 利用软件并行性的运行时电源管理器
Simon Holmbacka, S. Lafond, J. Lilius
{"title":"A run-time power manager exploiting software parallelism","authors":"Simon Holmbacka, S. Lafond, J. Lilius","doi":"10.1145/2628071.2628116","DOIUrl":"https://doi.org/10.1145/2628071.2628116","url":null,"abstract":"In this paper we unify the existing power saving techniques: DVFS and DPM (sleep states) and show how an optimized balance between dynamic and static power leads to minimal energy consumption.","PeriodicalId":263670,"journal":{"name":"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123900973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
PATS: Pattern aware scheduling and power gating for GPGPUs PATS:用于gpgpu的模式感知调度和功率门控
Qiumin Xu, M. Annavaram
{"title":"PATS: Pattern aware scheduling and power gating for GPGPUs","authors":"Qiumin Xu, M. Annavaram","doi":"10.1145/2628071.2628105","DOIUrl":"https://doi.org/10.1145/2628071.2628105","url":null,"abstract":"General purpose computing using graphics processing units (GPGPUs) is an attractive option to achieve power efficient throughput computing. But the power efficiency of GPGPUs can be significantly curtailed in the presence of divergence. This paper evaluates two important facets of this problem. First, we study the branch divergence behavior of various GPGPU workloads. We show that only a few branch divergence patterns are dominant in most workloads. In fact only five branch divergence patterns account for 60% of all the divergent instructions in our workloads. In the second part of this work we exploit this branch divergence pattern bias to propose a new divergence pattern aware warp scheduler, called PATS. PATS prioritizes scheduling warps with the same divergence pattern so as to create long idleness windows for any given execution lane. The long idleness windows are then exploited for efficiently power gating the unused lanes while amortizing the gating overhead. We describe the architectural implementation details of PATS and evaluate the power and performance impact of PATS. Our proposed design significantly improves power gating efficiency of GPGPUs with minimal performance overhead.","PeriodicalId":263670,"journal":{"name":"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129784376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Active learning accelerated automatic heuristic construction for parallel program mapping
William F. Ogilvie, Pavlos Petoumenos, Z. Wang, Hugh Leather
{"title":"Active learning accelerated automatic heuristic construction for parallel program mapping","authors":"William F. Ogilvie, Pavlos Petoumenos, Z. Wang, Hugh Leather","doi":"10.1145/2628071.2628128","DOIUrl":"https://doi.org/10.1145/2628071.2628128","url":null,"abstract":"Building effective optimization heuristics is a challenging task which often takes developers several months if not years to complete. Predictive modelling has recently emerged as a promising solution, automatically constructing heuristics from training data, however, obtaining this data can take months per platform. This is becoming an ever more critical problem as the pace of change in architecture increases. Indeed, if no solution is found we shall be left with out of date heuristics which cannot extract the best performance from modern machines. In this work, we present a low-cost predictive modelling approach for automatic heuristic construction which significantly reduces this training overhead. Typically in supervised learning the training instances are randomly selected to evaluate regardless of how much useful information they carry, but this wastes effort on parts of the space that contribute little to the quality of the produced heuristic. Our approach, on the other hand, uses active learning to select and only focus on the most useful training examples and thus reduces the training overhead. We demonstrate this technique by automatically creating a model to determine on which device to execute four parallel programs at differing problem dimensions for a representative Cpu-Gpu based system. Our methodology is remarkably simple and yet effective, making it a strong candidate for wide adoption. At high levels of classification accuracy the average learning speed-up is 3×, as compared to the state-of-the-art.","PeriodicalId":263670,"journal":{"name":"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130956207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Rollback-free value prediction with approximate loads 具有近似负载的无回滚值预测
Bradley Thwaites, Gennady Pekhimenko, H. Esmaeilzadeh, A. Yazdanbakhsh, O. Mutlu, Jongse Park, Girish Mururu, T. Mowry
{"title":"Rollback-free value prediction with approximate loads","authors":"Bradley Thwaites, Gennady Pekhimenko, H. Esmaeilzadeh, A. Yazdanbakhsh, O. Mutlu, Jongse Park, Girish Mururu, T. Mowry","doi":"10.1145/2628071.2628110","DOIUrl":"https://doi.org/10.1145/2628071.2628110","url":null,"abstract":"This paper demonstrates how to utilize the inherent error resilience of a wide range of applications to mitigate the memory wall — the discrepancy between core and memory speed. We define a new microarchitecturally-triggered approximation technique called rollback-free value prediction. This technique predicts the value of safe-to-approximate loads when they miss in the cache without tracking mispredictions or requiring costly recovery from misspeculations. This technique mitigates the memory wall by allowing the core to continue computation without stalling for long-latency memory accesses. Our detailed study of the quality trade-offs shows that with a modern out-of-order processor, average 8% (up to 19%) performance improvement is possible with 0.8% (up to 1.8%) average quality loss on an approximable subset of SPEC CPU 2000/2006.","PeriodicalId":263670,"journal":{"name":"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121821427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信