Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores最新文献

Deciphering Predictive Schedulers for Heterogeneous-ISA Multicore Architectures 异构isa多核架构的预测调度器解译

Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores Pub Date : 2019-02-17 DOI: 10.1145/3303084.3309492

A. Prodromou, A. Venkat, D. Tullsen

引用次数: 6

LiTM: A Lightweight Deterministic Software Transactional Memory System 轻量级确定性软件事务性内存系统

Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores Pub Date : 2019-02-17 DOI: 10.1145/3303084.3309487

Yuchong Xia, Xiangyao Yu, William S. Moses, Julian Shun, S. Devadas

引用次数: 5

Task-DAG Support in Single-Source PHAST Library: Enabling Flexible Assignment of Tasks to CPUs and GPUs in Heterogeneous Architectures 单源PHAST库中的Task-DAG支持:在异构架构中为cpu和gpu灵活分配任务

Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores Pub Date : 2019-02-17 DOI: 10.1145/3303084.3309496

Biagio Peccerillo, S. Bartolini

{"title":"Task-DAG Support in Single-Source PHAST Library: Enabling Flexible Assignment of Tasks to CPUs and GPUs in Heterogeneous Architectures","authors":"Biagio Peccerillo, S. Bartolini","doi":"10.1145/3303084.3309496","DOIUrl":"https://doi.org/10.1145/3303084.3309496","url":null,"abstract":"Nowadays, the majority of desktop, mobile, and embedded devices in the consumer and industrial markets are heterogeneous, as they contain at least multi-core CPU and GPU resources in the same system. However, exploiting the performance and energy-efficiency of these diverse processing elements does not come for free from a software point of view: programmers need to a) code each activity through the specific approaches, libraries, and frameworks suitable for their target architecture (e.g., CPUs and GPUs) along with the orchestration of such heterogeneous execution, and b) decide the distribution of sequential and parallel activities towards the different parallel hardware resources available. Current frameworks typically provide either low-abstraction-level target-specific and/or generic but not high-performance interfaces, which complicate the exploration of different task assignments, with DAG1 precedence relationship, to the available heterogeneous resources. To enable this, tasks would typically need to be coded one time for each target architecture due to the profound differences in their programming. In this work, we include the support of tasks and DAGs of data-parallel tasks within the single-source PHAST library, which currently supports both multi-core CPUs and NVIDIA GPUs, so that tasks are coded in a target-agnostic fashion and their targeting to multi-core or GPU architectures is automatic and efficient. The integration of this coding approach with tasks can help to postpone the choice of the execution platform for each task up to the testing, or even to the runtime, phase. Finally, we demonstrate the effects of this approach in the case of a sample image pipeline benchmark from the computer vision domain. We compare our implementation to a SYCL implementation from a productivity point of view. Also, we show that various task assignments can be seamlessly explored by implementing both the PEFT2 mapping technique along with an exhaustive search in the mapping space.","PeriodicalId":408167,"journal":{"name":"Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126501720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Process Barrier for Predictable and Repeatable Concurrent Execution 可预测和可重复并发执行的进程障碍

Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores Pub Date : 2019-02-17 DOI: 10.1145/3303084.3309494

Masataka Nishi

{"title":"Process Barrier for Predictable and Repeatable Concurrent Execution","authors":"Masataka Nishi","doi":"10.1145/3303084.3309494","DOIUrl":"https://doi.org/10.1145/3303084.3309494","url":null,"abstract":"We study on how to design, debug and verify and validate (V&V) safety-critical control software running on shared-memory many-core platforms. Managing concurrency in a verifiable way is a certification requirement. The presented process barrier is a simple concurrency control mechanism that guarantees deadlock-freedom by-design and temporal separation of tasks, while allowing non-conflicting tasks to run in parallel. It is placed in a lock-free task queue (LFTQ) and a group of processors are allocated to compete to dequeue and execute the tasks registered in the LFTQ. The process barrier consists of a checker and limiter pair. A process that dequeues the checker monitors for completion of preceding tasks in the LFTQ that conflicts with a subsequent task in the LFTQ. The process dequeues the paired limiter from the LFTQ upon completion. All other processes that find the limiter at the head of the LFTQ periodically checks if the head of the LFTQ points to subsequent tasks which happens after the process that took the checker task dequeues the limiter. The mechanism manages concurrent execution of the registered tasks that conflict on data, shared resources and execution order in a way that becomes conflict equivalent to sequential execution. The trace of the concurrent execution and the consequent program state is repeatable. We can reuse existing toolchains for single-core platforms for debugging, testing and V&V. The temporal behavior of the concurrent execution becomes predictable and the worst-case execution time (WCET) of it is bounded.","PeriodicalId":408167,"journal":{"name":"Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125494862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Wait-free Dynamic Transactions for Linked Data Structures 链接数据结构的无等待动态事务

Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores Pub Date : 2019-02-17 DOI: 10.1145/3303084.3309491

P. Laborde, Lance Lebanoff, Christina L. Peterson, Deli Zhang, D. Dechev

{"title":"Wait-free Dynamic Transactions for Linked Data Structures","authors":"P. Laborde, Lance Lebanoff, Christina L. Peterson, Deli Zhang, D. Dechev","doi":"10.1145/3303084.3309491","DOIUrl":"https://doi.org/10.1145/3303084.3309491","url":null,"abstract":"Transactional data structures support threads executing a sequence of operations atomically. Dynamic transactions allow operands to be generated on the fly and allows threads to execute code in between the operations of a transaction, in contrast to static transactions which need to know the operands in advance. A framework called Lock-free Transactional Transformation (LFTT) allows data structures to run high-performance transactions, but it only supports static transactions. We extend LFTT to add support for dynamic transactions and wait-free progress while retaining its speed. The thread-helping scheme of LFTT presents a unique challenge to dynamic transactions. We overcome this challenge by changing the input of LFTT from a list of operations to a function, forcing helping threads to always start at the beginning of the transaction, and allowing threads to skip completed operations through the use of a list of return values. We thoroughly evaluate the performance impact of support for dynamic transactions and wait-free progress and find that these features do not hurt the performance of LFTT for our test cases.","PeriodicalId":408167,"journal":{"name":"Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117248625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Don't Forget About Synchronization!: A Case Study of K-Means on GPU 不要忘记同步!:以GPU上的K-Means为例

Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores Pub Date : 2019-02-17 DOI: 10.1145/3303084.3309488

J. Nelson, R. Palmieri

{"title":"Don't Forget About Synchronization!: A Case Study of K-Means on GPU","authors":"J. Nelson, R. Palmieri","doi":"10.1145/3303084.3309488","DOIUrl":"https://doi.org/10.1145/3303084.3309488","url":null,"abstract":"Heterogeneous devices are becoming necessary components of high performance computing infrastructures, and the graphics processing unit (GPU) plays an important role in this landscape. Given a problem, the established approach for exploiting the GPU is to design solutions that are parallel, without data or flow dependencies. These solutions are then offloaded to the GPU's massively parallel capability. This design principle (i.e., avoiding contention) often leads to developing applications that cannot maximize GPU hardware utilization. The goal of this paper is to challenge this common belief by empirically showing that allowing even simple forms of synchronization enables programmers to design parallel solutions that admit conflicts and achieve better utilization of hardware parallelism. Our experience shows that lock-based solutions to the k-means clustering problem outperform the well-engineered and parallel KMCUDA on both synthetic and real datasets; averaging 8.4x faster runtimes at high contention and 8.1x faster for low contention, with maximums of 25.4x and 74x, respectively. We summarize our findings by identifying two guidelines to help make concurrency effective when programming GPU applications.","PeriodicalId":408167,"journal":{"name":"Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128831203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Formal Verification through Combinatorial Topology: the CAS-Extended Model 通过组合拓扑的形式化验证:cas扩展模型

Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores Pub Date : 2019-02-17 DOI: 10.1145/3303084.3309493

Christina L. Peterson, D. Dechev

{"title":"Formal Verification through Combinatorial Topology: the CAS-Extended Model","authors":"Christina L. Peterson, D. Dechev","doi":"10.1145/3303084.3309493","DOIUrl":"https://doi.org/10.1145/3303084.3309493","url":null,"abstract":"Wait-freedom guarantees that all processes complete their operations in a finite number of steps regardless of the delay of any process. Combinatorial topology has been proposed in the literature as a formal verification technique to prove the wait-free computability of decision tasks. Wait-freedom is proved through the properties of a static topological structure that expresses all possible combinations of execution paths of the protocol solving the decision task. The practical application of combinatorial topology as a formal verification technique is limited because the existing theory only considers protocols in which the manner of communication between processes is through read-write memory. This research proposes an extension to the existing theory, called the CAS-extended model. The extended theory includes Compare-And-Swap (CAS) and Load-Linked/Store-Conditional (LL/SC) which are atomic primitives used to achieve wait-freedom in state-of-the-art protocols. The CAS-extended model theory can be used to formally verify wait-free algorithms used in practice, such as concurrent data structures. We present new definitions detailing the construction of a protocol complex in the CAS-extended model. As a proof-of-concept, we formally verify a wait-free queue with three processes using the CAS-extended combinatorial topology.","PeriodicalId":408167,"journal":{"name":"Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115569594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores 第十届多核与多核编程模型与应用国际研讨会论文集

Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores Pub Date : 1900-01-01 DOI: 10.1145/3303084

引用次数: 1