2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)最新文献_第8页

A Novel Set of Directives for Multi-device Programming with OpenMP 用OpenMP进行多设备编程的一套新指令

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00075

Raul Torres, R. Ferrer, Xavier Teruel

引用次数: 0

Exploiting High-Bandwidth Memory for FPGA-Acceleration of Inference on Sum-Product Networks 利用高带宽存储器实现和积网络的fpga加速推理

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00028

Lukas Weber, John M. Wirth, Lukas Sommer, A. Koch

{"title":"Exploiting High-Bandwidth Memory for FPGA-Acceleration of Inference on Sum-Product Networks","authors":"Lukas Weber, John M. Wirth, Lukas Sommer, A. Koch","doi":"10.1109/IPDPSW55747.2022.00028","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00028","url":null,"abstract":"Due to the memory wall becoming increasingly problematic in high-performance computing, there is a steady push to improve memory architectures, mainly focusing on better bandwidth as well as latency. One of the results of this push is the development of High-Bandwidth Memory (HBM) which is an alternative to the regular DRAM typically used by accelerator-cards. This work adapts an existing accelerator architecture for inference on Sum-Product Networks (SPN) to exploit the HBM present on more recent high-performance FPGA-accelerator cards. The evaluation shows that the use of HBM enables almost linear scaling of the performance due to the embarrassingly parallel nature of batch-wise SPN inference. It is also shown that the only hindrance to this scaling is the limited bandwidth available for data-transfers between host and FPGA. Even with this bottleneck, the prior FPGA-based implementation is outperformed by up to 1.50x (geo.-mean 1.29x). Similarly, the CPU and GPU baselines are outperformed by up to 2.4x (geo.-mean 1.6x) and 8.4x (geo.-mean 6.9x) respectively. Based on the evaluation, the scaling potential of HBM-based FPGA-accelerators is explored to give an outlook on what is to come with future generations of PCIe-based interfaces.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129541372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MadPipe: Memory Aware Dynamic Programming Algorithm for Pipelined Model Parallelism 管道模型并行的内存感知动态规划算法

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00174

Olivier Beaumont, Lionel Eyraud-Dubois, Alena Shilova

{"title":"MadPipe: Memory Aware Dynamic Programming Algorithm for Pipelined Model Parallelism","authors":"Olivier Beaumont, Lionel Eyraud-Dubois, Alena Shilova","doi":"10.1109/IPDPSW55747.2022.00174","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00174","url":null,"abstract":"The training phase in Deep Neural Networks (DNNs) is very computationally intensive and is nowadays often performed on parallel computing platforms, ranging from a few GPUs to several thousand GPUs. The strategy of choice for the parallelization of training is the so-called data parallel approach, based on the parallel training of the different inputs (typically images) and the aggregation of network weights with collective communications (AllReduce operation). The scalability of this approach is limited both by the memory available on each node and the networking capacities for collective operations. Recently, a parallel model approach has been proposed (PipeDream, Gpipe), in which the DNN weights are distributed and images are trained in a pipeline/stream manner over the computational nodes. In this paper, we formalize in detail the optimization problem associated with the placement of DNN layers onto computation resources when using pipelined model parallelism, and we derive a dynamic programming based heuristic, MadPipe. We show through extensive simulations based on realistic networks that MadPipe significantly improves the performance of the pipelined parallel model approach compared to PipeDream.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126202686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

An On-the-Fly Method to Exchange Vector Clocks in Distributed-Memory Programs 分布式内存程序中动态交换矢量时钟的方法

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00093

Simon Schwitanski, Felix Tomski, Joachim Protze, C. Terboven, Matthias S. Müller

{"title":"An On-the-Fly Method to Exchange Vector Clocks in Distributed-Memory Programs","authors":"Simon Schwitanski, Felix Tomski, Joachim Protze, C. Terboven, Matthias S. Müller","doi":"10.1109/IPDPSW55747.2022.00093","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00093","url":null,"abstract":"Vector clocks are logical timestamps used in correctness tools to analyze the happened-before relation between events in parallel program executions. In particular, race detectors use them to find concurrent conflicting memory accesses, and replay tools use them to reproduce or find alternative execution paths. To record the happened-before relation with vector clocks, tool developers have to consider the different synchronization concepts of a programming model, e.g., barriers, locks, or message exchanges. Especially in distributed-memory programs, various concepts result in explicit and implicit synchronization between processes. Previously implemented vector clock exchanges are often specific to a single programming model, and a translation to other programming models is not trivial. Consequently, analyses relying on the vector clock exchange remain model-specific. This paper proposes an abstraction layer for on-the-fly vector clock exchanges for distributed-memory programs. Based on the programming models MPI, OpenSHMEM, and GASPI, we define common synchronization primitives and explain how model-specific procedures map to our model-agnostic abstraction layer. The exchange model is general enough also to support synchronization concepts of other parallel programming models. We present our implementation of the vector clock abstraction layer based on the Generic Tool Infrastructure with translators for MPI and OpenSHMEM. In an overhead study using the SPEC MPI 2007 benchmarks, the slowdown of the implemented vector clock exchange ranges from 1.1x to 12.6x for runs with up to 768 processes.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126248253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

IPDPS 2022 PhD Forum IPDPS 2022博士论坛

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00224

S. Bhowmick, Anne-Cécile Orgerie

引用次数: 0

A Parallel Novelty Search Metaheuristic Applied to a Wildfire Prediction System 并行新颖性搜索元启发式算法在野火预测系统中的应用

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.48550/arXiv.2207.11646

Jan Strappa, Paola Caymes-Scutari, G. Bianchini

{"title":"A Parallel Novelty Search Metaheuristic Applied to a Wildfire Prediction System","authors":"Jan Strappa, Paola Caymes-Scutari, G. Bianchini","doi":"10.48550/arXiv.2207.11646","DOIUrl":"https://doi.org/10.48550/arXiv.2207.11646","url":null,"abstract":"Wildfires are a highly prevalent multi-causal environmental phenomenon. The impact of this phenomenon includes human losses, environmental damage and high economic costs. To mitigate these effects, several computer simulation systems have been developed in order to predict fire behavior based on a set of input parameters, also called a scenario (wind speed and direction; temperature; etc.). However, the results of a simulation usually have a high degree of error due to the uncertainty in the values of some variables, because they are not known, or because their measurement may be imprecise, erroneous, or impossible to perform in real time. Previous works have proposed the combination of multiple results in order to reduce this uncertainty. State-of-the-art methods are based on parallel optimization strategies that use a fitness function to guide the search among all possible scenarios. Although these methods have shown improvements in the quality of predictions, they have some limitations related to the algorithms used for the selection of scenarios. To overcome these limitations, in this work we propose to apply the Novelty Search paradigm, which replaces the objective function by a measure of the novelty of the solutions found, which allows the search to continuously generate solutions with behaviors that differ from one another. This approach avoids local optima and may be able to find useful solutions that would be difficult or impossible to find by other algorithms. As with existing methods, this proposal may also be adapted to other propagation models (floods, avalanches or landslides).","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115739601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploration Framework for Synthesizable CGRAs Targeting HPC: Initial Design and Evaluation 靶向HPC的可合成CGRAs探索框架:初步设计与评价

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00113

B. Adhi, Carlos Cortes, Y. Tan, Takuya Kojima, Artur Podobas, K. Sano

引用次数: 5

Separated Allocator Metadata in Disaggregated In-Memory Databases: Friend or Foe? 分解内存数据库中分离的分配器元数据:是敌是友?

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00207

Marcel Weisgut, Daniel Ritter, Martin Boissier, M. Perscheid

引用次数: 2

29th Reconfigurable Architectures Workshop (RAW 2022) 第29届可重构架构研讨会(RAW 2022)

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00018

J. Becker, Lana Josipović, V. Prasanna, M. Santambrogio, R. Vaidyanathan

引用次数: 0

Quantifying Composable Data Center Utilization 量化可组合数据中心利用率

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00206

M. Taubenblatt, A. Tantawi

引用次数: 0