Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores最新文献

筛选
英文 中文
Deciphering Predictive Schedulers for Heterogeneous-ISA Multicore Architectures 异构isa多核架构的预测调度器解译
A. Prodromou, A. Venkat, D. Tullsen
{"title":"Deciphering Predictive Schedulers for Heterogeneous-ISA Multicore Architectures","authors":"A. Prodromou, A. Venkat, D. Tullsen","doi":"10.1145/3303084.3309492","DOIUrl":"https://doi.org/10.1145/3303084.3309492","url":null,"abstract":"Heterogeneous architectures have become increasingly common. From co-packaging small and large cores, to GPUs alongside CPUs, to general-purpose heterogeneous-ISA architectures with cores implementing different ISAs. As diversity of execution cores grows, predictive models become of paramount importance for scheduling and resource allocation. In this paper, we investigate the capabilities of performance predictors in a heterogeneous-ISA setting, as well as the predictors' effects on scheduler quality. We follow an unbiased feature selection methodology to identify the optimal set of features for this task, instead of pre-selecting features before training. We propose metrics that bridge the gap between traditional prediction accuracy metrics and a scheduler's performance. We further present our evaluation methodology, which was meticulously designed with this study in mind, and finally, we incorporate our findings in ML-based schedulers and evaluate their sensitivity to the underlying system's level of heterogeneity.","PeriodicalId":408167,"journal":{"name":"Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123555887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
LiTM: A Lightweight Deterministic Software Transactional Memory System 轻量级确定性软件事务性内存系统
Yuchong Xia, Xiangyao Yu, William S. Moses, Julian Shun, S. Devadas
{"title":"LiTM: A Lightweight Deterministic Software Transactional Memory System","authors":"Yuchong Xia, Xiangyao Yu, William S. Moses, Julian Shun, S. Devadas","doi":"10.1145/3303084.3309487","DOIUrl":"https://doi.org/10.1145/3303084.3309487","url":null,"abstract":"Deterministic software transactional memory (STM) is a useful programming model for writing parallel codes, as it improves programmability (by supporting transactions) and debuggability (by supporting determinism). This paper presents LiTM, a new deterministic STM system that achieves both simplicity and efficiency at the same time. LiTM implements the deterministic reservations framework of Blelloch et al., but without requiring the programmer to understand the internals of the algorithm. Instead, the programmer writes the program in a transactional fashion and LiTM manages all data conflicts and automatically achieves deterministic parallelism. Our experiments on six benchmarks show that LiTM outperforms the state-of-the-art framework Galois by up to 5.8× on a 40-core machine.","PeriodicalId":408167,"journal":{"name":"Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124120564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Task-DAG Support in Single-Source PHAST Library: Enabling Flexible Assignment of Tasks to CPUs and GPUs in Heterogeneous Architectures 单源PHAST库中的Task-DAG支持:在异构架构中为cpu和gpu灵活分配任务
Biagio Peccerillo, S. Bartolini
{"title":"Task-DAG Support in Single-Source PHAST Library: Enabling Flexible Assignment of Tasks to CPUs and GPUs in Heterogeneous Architectures","authors":"Biagio Peccerillo, S. Bartolini","doi":"10.1145/3303084.3309496","DOIUrl":"https://doi.org/10.1145/3303084.3309496","url":null,"abstract":"Nowadays, the majority of desktop, mobile, and embedded devices in the consumer and industrial markets are heterogeneous, as they contain at least multi-core CPU and GPU resources in the same system. However, exploiting the performance and energy-efficiency of these diverse processing elements does not come for free from a software point of view: programmers need to a) code each activity through the specific approaches, libraries, and frameworks suitable for their target architecture (e.g., CPUs and GPUs) along with the orchestration of such heterogeneous execution, and b) decide the distribution of sequential and parallel activities towards the different parallel hardware resources available. Current frameworks typically provide either low-abstraction-level target-specific and/or generic but not high-performance interfaces, which complicate the exploration of different task assignments, with DAG1 precedence relationship, to the available heterogeneous resources. To enable this, tasks would typically need to be coded one time for each target architecture due to the profound differences in their programming. In this work, we include the support of tasks and DAGs of data-parallel tasks within the single-source PHAST library, which currently supports both multi-core CPUs and NVIDIA GPUs, so that tasks are coded in a target-agnostic fashion and their targeting to multi-core or GPU architectures is automatic and efficient. The integration of this coding approach with tasks can help to postpone the choice of the execution platform for each task up to the testing, or even to the runtime, phase. Finally, we demonstrate the effects of this approach in the case of a sample image pipeline benchmark from the computer vision domain. We compare our implementation to a SYCL implementation from a productivity point of view. Also, we show that various task assignments can be seamlessly explored by implementing both the PEFT2 mapping technique along with an exhaustive search in the mapping space.","PeriodicalId":408167,"journal":{"name":"Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126501720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Process Barrier for Predictable and Repeatable Concurrent Execution 可预测和可重复并发执行的进程障碍
Masataka Nishi
{"title":"Process Barrier for Predictable and Repeatable Concurrent Execution","authors":"Masataka Nishi","doi":"10.1145/3303084.3309494","DOIUrl":"https://doi.org/10.1145/3303084.3309494","url":null,"abstract":"We study on how to design, debug and verify and validate (V&V) safety-critical control software running on shared-memory many-core platforms. Managing concurrency in a verifiable way is a certification requirement. The presented process barrier is a simple concurrency control mechanism that guarantees deadlock-freedom by-design and temporal separation of tasks, while allowing non-conflicting tasks to run in parallel. It is placed in a lock-free task queue (LFTQ) and a group of processors are allocated to compete to dequeue and execute the tasks registered in the LFTQ. The process barrier consists of a checker and limiter pair. A process that dequeues the checker monitors for completion of preceding tasks in the LFTQ that conflicts with a subsequent task in the LFTQ. The process dequeues the paired limiter from the LFTQ upon completion. All other processes that find the limiter at the head of the LFTQ periodically checks if the head of the LFTQ points to subsequent tasks which happens after the process that took the checker task dequeues the limiter. The mechanism manages concurrent execution of the registered tasks that conflict on data, shared resources and execution order in a way that becomes conflict equivalent to sequential execution. The trace of the concurrent execution and the consequent program state is repeatable. We can reuse existing toolchains for single-core platforms for debugging, testing and V&V. The temporal behavior of the concurrent execution becomes predictable and the worst-case execution time (WCET) of it is bounded.","PeriodicalId":408167,"journal":{"name":"Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125494862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Wait-free Dynamic Transactions for Linked Data Structures 链接数据结构的无等待动态事务
P. Laborde, Lance Lebanoff, Christina L. Peterson, Deli Zhang, D. Dechev
{"title":"Wait-free Dynamic Transactions for Linked Data Structures","authors":"P. Laborde, Lance Lebanoff, Christina L. Peterson, Deli Zhang, D. Dechev","doi":"10.1145/3303084.3309491","DOIUrl":"https://doi.org/10.1145/3303084.3309491","url":null,"abstract":"Transactional data structures support threads executing a sequence of operations atomically. Dynamic transactions allow operands to be generated on the fly and allows threads to execute code in between the operations of a transaction, in contrast to static transactions which need to know the operands in advance. A framework called Lock-free Transactional Transformation (LFTT) allows data structures to run high-performance transactions, but it only supports static transactions. We extend LFTT to add support for dynamic transactions and wait-free progress while retaining its speed. The thread-helping scheme of LFTT presents a unique challenge to dynamic transactions. We overcome this challenge by changing the input of LFTT from a list of operations to a function, forcing helping threads to always start at the beginning of the transaction, and allowing threads to skip completed operations through the use of a list of return values. We thoroughly evaluate the performance impact of support for dynamic transactions and wait-free progress and find that these features do not hurt the performance of LFTT for our test cases.","PeriodicalId":408167,"journal":{"name":"Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117248625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Don't Forget About Synchronization!: A Case Study of K-Means on GPU 不要忘记同步!:以GPU上的K-Means为例
J. Nelson, R. Palmieri
{"title":"Don't Forget About Synchronization!: A Case Study of K-Means on GPU","authors":"J. Nelson, R. Palmieri","doi":"10.1145/3303084.3309488","DOIUrl":"https://doi.org/10.1145/3303084.3309488","url":null,"abstract":"Heterogeneous devices are becoming necessary components of high performance computing infrastructures, and the graphics processing unit (GPU) plays an important role in this landscape. Given a problem, the established approach for exploiting the GPU is to design solutions that are parallel, without data or flow dependencies. These solutions are then offloaded to the GPU's massively parallel capability. This design principle (i.e., avoiding contention) often leads to developing applications that cannot maximize GPU hardware utilization. The goal of this paper is to challenge this common belief by empirically showing that allowing even simple forms of synchronization enables programmers to design parallel solutions that admit conflicts and achieve better utilization of hardware parallelism. Our experience shows that lock-based solutions to the k-means clustering problem outperform the well-engineered and parallel KMCUDA on both synthetic and real datasets; averaging 8.4x faster runtimes at high contention and 8.1x faster for low contention, with maximums of 25.4x and 74x, respectively. We summarize our findings by identifying two guidelines to help make concurrency effective when programming GPU applications.","PeriodicalId":408167,"journal":{"name":"Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128831203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Formal Verification through Combinatorial Topology: the CAS-Extended Model 通过组合拓扑的形式化验证:cas扩展模型
Christina L. Peterson, D. Dechev
{"title":"Formal Verification through Combinatorial Topology: the CAS-Extended Model","authors":"Christina L. Peterson, D. Dechev","doi":"10.1145/3303084.3309493","DOIUrl":"https://doi.org/10.1145/3303084.3309493","url":null,"abstract":"Wait-freedom guarantees that all processes complete their operations in a finite number of steps regardless of the delay of any process. Combinatorial topology has been proposed in the literature as a formal verification technique to prove the wait-free computability of decision tasks. Wait-freedom is proved through the properties of a static topological structure that expresses all possible combinations of execution paths of the protocol solving the decision task. The practical application of combinatorial topology as a formal verification technique is limited because the existing theory only considers protocols in which the manner of communication between processes is through read-write memory. This research proposes an extension to the existing theory, called the CAS-extended model. The extended theory includes Compare-And-Swap (CAS) and Load-Linked/Store-Conditional (LL/SC) which are atomic primitives used to achieve wait-freedom in state-of-the-art protocols. The CAS-extended model theory can be used to formally verify wait-free algorithms used in practice, such as concurrent data structures. We present new definitions detailing the construction of a protocol complex in the CAS-extended model. As a proof-of-concept, we formally verify a wait-free queue with three processes using the CAS-extended combinatorial topology.","PeriodicalId":408167,"journal":{"name":"Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115569594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores 第十届多核与多核编程模型与应用国际研讨会论文集
{"title":"Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores","authors":"","doi":"10.1145/3303084","DOIUrl":"https://doi.org/10.1145/3303084","url":null,"abstract":"","PeriodicalId":408167,"journal":{"name":"Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117234256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信