Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores最新文献

Intra-Task Parallelism in Automotive Real-Time Systems 汽车实时系统中的任务内并行性

Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores Pub Date : 2018-02-24 DOI: 10.1145/3178442.3178449

Remko van Wagensveld, Tobias Wägemann, Niklas Hehenkamp, Ramin Tavakoli Kolagari, Ulrich Margull, Ralph Mader

引用次数: 2

Understanding Parallelization Tradeoffs for Linear Pipelines 理解线性管道的并行化权衡

Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores Pub Date : 2018-02-24 DOI: 10.1145/3178442.3178443

Aristeidis Mastoras, T. Gross

{"title":"Understanding Parallelization Tradeoffs for Linear Pipelines","authors":"Aristeidis Mastoras, T. Gross","doi":"10.1145/3178442.3178443","DOIUrl":"https://doi.org/10.1145/3178442.3178443","url":null,"abstract":"Pipelining techniques execute some loops with cross-iteration dependences in parallel, by partitioning the loop body into a sequence of stages such that the data dependences are not violated. Obtaining good performance for all kinds of loops is challenging and current techniques, e.g., PS-DSWP and LBPP, have difficulties handling load-imbalanced loops. Particularly, for loop iterations that differ substantially in execution time, these techniques achieve load-balancing by assigning work to threads using round-robin scheduling. Algorithms that rely on work-stealing e.g., Piper, efficiently handle load-imbalanced loops, but the high overhead of the scheduler implies poor performance for fine-grained loops. In this paper, we present Proteas, a programming model to allow tradeoffs between load-balancing, partitioning, mapping, synchronization, chunking, and scheduling. Proteas provides a set of simple directives to express the different mappings to handle a loop's parallelism. Then, a source-to-source compiler generates parallel code to support experimentation with Proteas. The directives allow us to investigate various tradeoffs and achieve good performance according to PS-DSWP and LBPP. In addition, the directives make a meaningful comparison to Piper possible. We present a performance evaluation on a 32-core system for a set of popular pipelined programs selected from three widely-used benchmark suites. The results show the tradeoffs of the different techniques and their parameters. Moreover, the results show that efficient handling of load-imbalanced fine-grained loops remains the main challenge.","PeriodicalId":328694,"journal":{"name":"Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores","volume":"13 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126059960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Supporting Fine-grained Dataflow Parallelism in Big Data Systems 支持大数据系统中的细粒度数据流并行性

Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores Pub Date : 2018-02-24 DOI: 10.1145/3178442.3178447

Sebastian Ertel, Justus Adam, J. Castrillón

引用次数: 6

Reduction to Band Form for the Singular Value Decomposition on Graphics Accelerators 图形加速器上奇异值分解的带形约简

Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores Pub Date : 2018-02-24 DOI: 10.1145/3178442.3178448

A. Tomás, Rafael Rodríguez-Sánchez, Sandra Catalán, E. S. Quintana‐Ortí

引用次数: 1

Fast and Accurate Performance Analysis of Synchronization 快速准确的同步性能分析

Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores Pub Date : 2018-02-24 DOI: 10.1145/3178442.3178446

Mario Badr, Natalie D. Enright Jerger

{"title":"Fast and Accurate Performance Analysis of Synchronization","authors":"Mario Badr, Natalie D. Enright Jerger","doi":"10.1145/3178442.3178446","DOIUrl":"https://doi.org/10.1145/3178442.3178446","url":null,"abstract":"Understanding parallel program bottlenecks is critical to designing more efficient and performant parallel architectures. Synchronization is a prime example of a potential bottleneck, but is a necessary evil when writing parallel programs; we must enforce correct access to shared data. Even the most expert programmers may find synchronization to be a significant overhead in their application. Techniques to mitigate synchronization overhead include speculative lock elision, faster hardware barriers, and load balancing via dynamic voltage and frequency scaling and thread migration to asymmetric cores. A key insight is that the timing of synchronization events, impacted not only by the progress of the current thread but also others, is fundamental to an application's performance. To enable a better understanding of multithreaded applications, we propose analytical model centered around the timing and ordering of synchronization events. Our model allows research across the stack to evaluate the performance of applications on future, nonexistent systems and architectures. Compared to real hardware, our model estimates performance with an average of 7.2% error across thirteen benchmarks and can generate performance characteristics per thread in less than a minute on average for very large (native) inputs.","PeriodicalId":328694,"journal":{"name":"Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134279500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

VAIL: A Victim-Aware Cache Policy for Improving Lifetime of Hybrid Memory VAIL:一种提高混合内存寿命的受害者感知缓存策略

Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores Pub Date : 2018-02-24 DOI: 10.1145/3178442.3178451

Youchuang Jia, Fang Zhou, Xiang Gao, Song Wu, Hai Jin, Xiaofei Liao, Pingpeng Yuan

引用次数: 2

Extending ILUPACK with a Task-Parallel Version of BiCG for Dual-GPU Servers 用任务并行版本的BiCG扩展双gpu服务器上的ILUPACK

Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores Pub Date : 2018-02-24 DOI: 10.1145/3178442.3178450

J. Aliaga, M. Bollhöfer, Ernesto Dufrechu, P. Ezzatti, E. S. Quintana‐Ortí

引用次数: 1

An Evaluation of Vectorization and Cache Reuse Tradeoffs on Modern CPUs 现代cpu上矢量化和缓存重用权衡的评估

Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores Pub Date : 2018-02-24 DOI: 10.1145/3178442.3178445

Du Shen, Milind Chabbi, Xu Liu

引用次数: 2

Combining PREM compilation and ILP scheduling for high-performance and predictable MPSoC execution 结合PREM编译和ILP调度，实现高性能和可预测的MPSoC执行

Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores Pub Date : 2018-02-24 DOI: 10.1145/3178442.3178444

J. Matejka, Björn Forsberg, M. Sojka, Z. Hanzálek, L. Benini, A. Marongiu

{"title":"Combining PREM compilation and ILP scheduling for high-performance and predictable MPSoC execution","authors":"J. Matejka, Björn Forsberg, M. Sojka, Z. Hanzálek, L. Benini, A. Marongiu","doi":"10.1145/3178442.3178444","DOIUrl":"https://doi.org/10.1145/3178442.3178444","url":null,"abstract":"Many applications require both high performance and predictable timing. High-performance can be provided by COTS Multi-Core System on Chips (MPSoC), however, as cores in these systems share the memory bandwidth they are susceptible to interference from each other, which is a problem for timing predictability. We achieve predictability on multi-cores by employing the predictable execution model (PREM), which splits execution into a sequence of memory and compute phases, and schedules these such that only a single core is executing a memory phase at a time. We present a toolchain consisting of a compiler and an Integer Linear Programming scheduling model. Our compiler uses loop analysis and tiling to transform application code into PREM compliant binaries. Furthermore, we solve the problem of scheduling execution on multiple cores while preventing interference of memory phases. We evaluate our toolchain on Advanced-Driver-Assistance-Systems-like scenario containing matrix multiplications and FFT computations on NVIDIA TX1. The results show that our approach maintains similar average performance and improves variance of completion times by a factor of 9.","PeriodicalId":328694,"journal":{"name":"Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121344400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores 第九届多核与多核编程模型与应用国际研讨会论文集

Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores Pub Date : 1900-01-01 DOI: 10.1145/3178442

引用次数: 0