2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)最新文献_第5页

POSTER: An optimization of dataflow architectures for scientific applications POSTER:科学应用的数据流架构优化

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) Pub Date : 2016-09-11 DOI: 10.1145/2967938.2974054

Xiaowei Shen, Xiaochun Ye, Xu Tan, Da Wang, Zhimin Zhang, Dongrui Fan, Zhimin Tang

引用次数: 6

Student research poster: Network controller emulation on a sidecore for unmodified virtual machines 学生研究海报:未修改虚拟机的侧核网络控制器仿真

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) Pub Date : 2016-09-11 DOI: 10.1145/2967938.2971469

Arthur Kiyanovski

引用次数: 0

Accelerating linked-list traversal through near-data processing 通过近数据处理加速链表遍历

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) Pub Date : 2016-09-11 DOI: 10.1145/2967938.2967958

B. Hong, Gwangsun Kim, Jung Ho Ahn, Yongkee Kwon, Hongsik Kim, John Kim

{"title":"Accelerating linked-list traversal through near-data processing","authors":"B. Hong, Gwangsun Kim, Jung Ho Ahn, Yongkee Kwon, Hongsik Kim, John Kim","doi":"10.1145/2967938.2967958","DOIUrl":"https://doi.org/10.1145/2967938.2967958","url":null,"abstract":"Recent technology advances in memory system design, along with 3D stacking, have made near-data processing (NDP) more feasible to accelerate different workloads. In this work, we explore near-data processing for a fundamental operation - linked-list traversal (LLT). We propose a new NDP architecture that does not change the existing sequential programming model and does not require any modification to the processor microarchitecture. Instead, we exploit the packetized interface between the core and the memory modules to off-load LLT for NDP. We leverage a system with multiple memory modules (e.g., hybrid memory cube (HMC) modules) interconnected with a memory network and our initial evaluation shows that simply off-loading LLT computation to near-memory can actually reduce performance because of the additional off-chip memory network channel traversals. Thus, we first propose NDP-aware data localization to exploit locality - including locality within a single memory module and memory vault - to minimize latency and improve energy efficiency. In order to improve overall throughput and maximize parallelism, we propose batching multiple LLT operations together to amortize the cost of NDP by utilizing the highly parallel execution of NDP processing units and the high bandwidth of 3D stacked DRAM. The combination of NDP-aware data localization and batching can provide significant improvement in performance and energy efficiency compared to host-processing.","PeriodicalId":407717,"journal":{"name":"2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)","volume":"11 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130280756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

A DSL compiler for accelerating image processing pipelines on FPGAs 用于加速fpga上图像处理管道的DSL编译器

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) Pub Date : 2016-09-11 DOI: 10.1145/2967938.2967969

Nitin Chugh, Vinay Vasista, Suresh Purini, Uday Bondhugula

引用次数: 50

Reduction drawing: Language constructs and polyhedral compilation for reductions on GPUs 约简图:gpu上约简的语言构造和多面体编译

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) Pub Date : 2016-09-11 DOI: 10.1145/2967938.2967950

Chandan Reddy, Michael Kruse, Albert Cohen

{"title":"Reduction drawing: Language constructs and polyhedral compilation for reductions on GPUs","authors":"Chandan Reddy, Michael Kruse, Albert Cohen","doi":"10.1145/2967938.2967950","DOIUrl":"https://doi.org/10.1145/2967938.2967950","url":null,"abstract":"Reductions are common in scientific and data-crunching codes, and a typical source of bottlenecks on massively parallel architectures such as GPUs. Reductions are memory-bound, and achieving peak performance involves sophisticated optimizations. There exist libraries such as CUB and Thrust providing highly tuned implementations of reductions on GPUs. However, library APIs are not flexible enough to express user-defined reductions on arbitrary data types and array indexing schemes. Languages such as OpenACC provide declarative syntax to express reductions. Such approaches support a limited range of reduction operators and do not facilitate the application of complex program transformations in presence of reductions. We present language constructs that let a programmer express arbitrary reductions on user-defined data types matching the performance of tuned library implementations. We also extend a polyhedral compilation flow to process these user-defined reductions, enabling optimizations such as the fusion of multiple reductions, combining reductions with other loop transformations, and optimizing data transfers and storage in the presence of reductions. We implemented these language constructs and compilation methods in the PPCG framework and conducted experiments on multiple GPU targets. For single reductions the generated code performs on par with highly tuned libraries, and for multiple reductions it significantly outperforms both libraries and OpenACC on all platforms.","PeriodicalId":407717,"journal":{"name":"2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129387739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Scaling data analytics with moore's law 用摩尔定律扩展数据分析

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) Pub Date : 2016-09-11 DOI: 10.1145/2967938.2970375

K. Olukotun

引用次数: 1

WearCore: A core for wearable workloads? WearCore:可穿戴工作负载的核心?

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) Pub Date : 2016-09-11 DOI: 10.1145/2967938.2967956

Sanyam Mehta, J. Torrellas

引用次数: 5

Hybrid data dependence analysis for loop transformations 循环转换的混合数据依赖分析

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) Pub Date : 2016-09-01 DOI: 10.1145/2967938.2974059

Diogo Sampaio, A. Ketterlin, L. Pouchet, F. Rastello

引用次数: 2

Big data analytics on flash storage with accelerators 带加速器的闪存大数据分析

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) Pub Date : 2016-09-01 DOI: 10.1145/2967938.2970374

Arvind

引用次数: 0

Vectorization of multibyte floating point data formats 多字节浮点数据格式的矢量化

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) Pub Date : 2016-01-26 DOI: 10.1145/2967938.2967966

Andrew Anderson, David Gregg

引用次数: 6