2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)最新文献

筛选
英文 中文
A Novel Set of Directives for Multi-device Programming with OpenMP 用OpenMP进行多设备编程的一套新指令
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00075
Raul Torres, R. Ferrer, Xavier Teruel
{"title":"A Novel Set of Directives for Multi-device Programming with OpenMP","authors":"Raul Torres, R. Ferrer, Xavier Teruel","doi":"10.1109/IPDPSW55747.2022.00075","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00075","url":null,"abstract":"The latest versions of OpenMP have been offering support for offloading execution to the accelerator devices present in a variety of heterogeneous architectures via the target directives. However, these directives can only refer to one device at a time, which makes multi-device programming an explicit and tedious task. In this work, we present an extension of OpenMP in the form of a new set of directives (target spread directives) which offers direct support for multiple devices and allows the distribution of data and/or workload among them without explicit programming. This results in an additional level of parallelism between the host and the devices. The target spread directives were evaluated using the Somier micro-app in a PowerPC cluster node with up to four Nvidia Tesla V100 GPUs. The results showed a speedup of approximately 2X using four GPUs and the new directive set, in comparison with the baseline implementation which used one GPU and the existing target directive set.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127561636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploiting High-Bandwidth Memory for FPGA-Acceleration of Inference on Sum-Product Networks 利用高带宽存储器实现和积网络的fpga加速推理
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00028
Lukas Weber, John M. Wirth, Lukas Sommer, A. Koch
{"title":"Exploiting High-Bandwidth Memory for FPGA-Acceleration of Inference on Sum-Product Networks","authors":"Lukas Weber, John M. Wirth, Lukas Sommer, A. Koch","doi":"10.1109/IPDPSW55747.2022.00028","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00028","url":null,"abstract":"Due to the memory wall becoming increasingly problematic in high-performance computing, there is a steady push to improve memory architectures, mainly focusing on better bandwidth as well as latency. One of the results of this push is the development of High-Bandwidth Memory (HBM) which is an alternative to the regular DRAM typically used by accelerator-cards. This work adapts an existing accelerator architecture for inference on Sum-Product Networks (SPN) to exploit the HBM present on more recent high-performance FPGA-accelerator cards. The evaluation shows that the use of HBM enables almost linear scaling of the performance due to the embarrassingly parallel nature of batch-wise SPN inference. It is also shown that the only hindrance to this scaling is the limited bandwidth available for data-transfers between host and FPGA. Even with this bottleneck, the prior FPGA-based implementation is outperformed by up to 1.50x (geo.-mean 1.29x). Similarly, the CPU and GPU baselines are outperformed by up to 2.4x (geo.-mean 1.6x) and 8.4x (geo.-mean 6.9x) respectively. Based on the evaluation, the scaling potential of HBM-based FPGA-accelerators is explored to give an outlook on what is to come with future generations of PCIe-based interfaces.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129541372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MadPipe: Memory Aware Dynamic Programming Algorithm for Pipelined Model Parallelism 管道模型并行的内存感知动态规划算法
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00174
Olivier Beaumont, Lionel Eyraud-Dubois, Alena Shilova
{"title":"MadPipe: Memory Aware Dynamic Programming Algorithm for Pipelined Model Parallelism","authors":"Olivier Beaumont, Lionel Eyraud-Dubois, Alena Shilova","doi":"10.1109/IPDPSW55747.2022.00174","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00174","url":null,"abstract":"The training phase in Deep Neural Networks (DNNs) is very computationally intensive and is nowadays often performed on parallel computing platforms, ranging from a few GPUs to several thousand GPUs. The strategy of choice for the parallelization of training is the so-called data parallel approach, based on the parallel training of the different inputs (typically images) and the aggregation of network weights with collective communications (AllReduce operation). The scalability of this approach is limited both by the memory available on each node and the networking capacities for collective operations. Recently, a parallel model approach has been proposed (PipeDream, Gpipe), in which the DNN weights are distributed and images are trained in a pipeline/stream manner over the computational nodes. In this paper, we formalize in detail the optimization problem associated with the placement of DNN layers onto computation resources when using pipelined model parallelism, and we derive a dynamic programming based heuristic, MadPipe. We show through extensive simulations based on realistic networks that MadPipe significantly improves the performance of the pipelined parallel model approach compared to PipeDream.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126202686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An On-the-Fly Method to Exchange Vector Clocks in Distributed-Memory Programs 分布式内存程序中动态交换矢量时钟的方法
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00093
Simon Schwitanski, Felix Tomski, Joachim Protze, C. Terboven, Matthias S. Müller
{"title":"An On-the-Fly Method to Exchange Vector Clocks in Distributed-Memory Programs","authors":"Simon Schwitanski, Felix Tomski, Joachim Protze, C. Terboven, Matthias S. Müller","doi":"10.1109/IPDPSW55747.2022.00093","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00093","url":null,"abstract":"Vector clocks are logical timestamps used in correctness tools to analyze the happened-before relation between events in parallel program executions. In particular, race detectors use them to find concurrent conflicting memory accesses, and replay tools use them to reproduce or find alternative execution paths. To record the happened-before relation with vector clocks, tool developers have to consider the different synchronization concepts of a programming model, e.g., barriers, locks, or message exchanges. Especially in distributed-memory programs, various concepts result in explicit and implicit synchronization between processes. Previously implemented vector clock exchanges are often specific to a single programming model, and a translation to other programming models is not trivial. Consequently, analyses relying on the vector clock exchange remain model-specific. This paper proposes an abstraction layer for on-the-fly vector clock exchanges for distributed-memory programs. Based on the programming models MPI, OpenSHMEM, and GASPI, we define common synchronization primitives and explain how model-specific procedures map to our model-agnostic abstraction layer. The exchange model is general enough also to support synchronization concepts of other parallel programming models. We present our implementation of the vector clock abstraction layer based on the Generic Tool Infrastructure with translators for MPI and OpenSHMEM. In an overhead study using the SPEC MPI 2007 benchmarks, the slowdown of the implemented vector clock exchange ranges from 1.1x to 12.6x for runs with up to 768 processes.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126248253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
IPDPS 2022 PhD Forum IPDPS 2022博士论坛
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00224
S. Bhowmick, Anne-Cécile Orgerie
{"title":"IPDPS 2022 PhD Forum","authors":"S. Bhowmick, Anne-Cécile Orgerie","doi":"10.1109/IPDPSW55747.2022.00224","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00224","url":null,"abstract":"The IEEE Computer Society has created the Technical Consortium on High Performance Computing (TCHPC) to advance and coordinate work in the field of high performance computing networking, storage, and analysis concepts, technologies and applications, and to expand the IEEE's role in this interdisciplinary and pervasive field. The Consortium has launched an Education and Outreach Initiative to coordinate activities, information, and best practices around HPC education/outreach across its member technical committees and the broader community. This includes the coordination of student activities across conferences, and the lead chair for this initiative is also chair the IPDPS 2022 PhD Forum.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122704197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Parallel Novelty Search Metaheuristic Applied to a Wildfire Prediction System 并行新颖性搜索元启发式算法在野火预测系统中的应用
Jan Strappa, Paola Caymes-Scutari, G. Bianchini
{"title":"A Parallel Novelty Search Metaheuristic Applied to a Wildfire Prediction System","authors":"Jan Strappa, Paola Caymes-Scutari, G. Bianchini","doi":"10.48550/arXiv.2207.11646","DOIUrl":"https://doi.org/10.48550/arXiv.2207.11646","url":null,"abstract":"Wildfires are a highly prevalent multi-causal environmental phenomenon. The impact of this phenomenon includes human losses, environmental damage and high economic costs. To mitigate these effects, several computer simulation systems have been developed in order to predict fire behavior based on a set of input parameters, also called a scenario (wind speed and direction; temperature; etc.). However, the results of a simulation usually have a high degree of error due to the uncertainty in the values of some variables, because they are not known, or because their measurement may be imprecise, erroneous, or impossible to perform in real time. Previous works have proposed the combination of multiple results in order to reduce this uncertainty. State-of-the-art methods are based on parallel optimization strategies that use a fitness function to guide the search among all possible scenarios. Although these methods have shown improvements in the quality of predictions, they have some limitations related to the algorithms used for the selection of scenarios. To overcome these limitations, in this work we propose to apply the Novelty Search paradigm, which replaces the objective function by a measure of the novelty of the solutions found, which allows the search to continuously generate solutions with behaviors that differ from one another. This approach avoids local optima and may be able to find useful solutions that would be difficult or impossible to find by other algorithms. As with existing methods, this proposal may also be adapted to other propagation models (floods, avalanches or landslides).","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115739601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploration Framework for Synthesizable CGRAs Targeting HPC: Initial Design and Evaluation 靶向HPC的可合成CGRAs探索框架:初步设计与评价
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00113
B. Adhi, Carlos Cortes, Y. Tan, Takuya Kojima, Artur Podobas, K. Sano
{"title":"Exploration Framework for Synthesizable CGRAs Targeting HPC: Initial Design and Evaluation","authors":"B. Adhi, Carlos Cortes, Y. Tan, Takuya Kojima, Artur Podobas, K. Sano","doi":"10.1109/IPDPSW55747.2022.00113","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00113","url":null,"abstract":"Among the more salient accelerator technologies to continue performance scaling in High-Performance Computing (HPC) are Coarse-Grained Reconfigurable Arrays (CGRAs). However, what benefits CGRAs will bring to HPC workloads and how those benefits will be reaped is an open research question today. In this work, we propose a framework to explore the design space of CGRAs for HPC workloads, which includes a tool flow of compilation and simulation, a CGRA HDL library written in SystemVerilog, and a synthesizable CGRA design as a baseline. Using RTL simulation, we evaluate two well-known computation kernels with the baseline CGRA for multiple different architectural parameters. The simulation results demonstrate both correctness and usefulness of our exploration framework.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115438854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Separated Allocator Metadata in Disaggregated In-Memory Databases: Friend or Foe? 分解内存数据库中分离的分配器元数据:是敌是友?
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00207
Marcel Weisgut, Daniel Ritter, Martin Boissier, M. Perscheid
{"title":"Separated Allocator Metadata in Disaggregated In-Memory Databases: Friend or Foe?","authors":"Marcel Weisgut, Daniel Ritter, Martin Boissier, M. Perscheid","doi":"10.1109/IPDPSW55747.2022.00207","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00207","url":null,"abstract":"Memory allocation has a significant impact on the performance of in-memory databases. While state-of-the-art memory allocators work well in DRAM-only setups, some of their design decisions might no longer yield efficiency if data is tiered to disaggregated memory or secondary memory tiers. In this work, we study the performance impact of metadata in memory allocators and their tiering to disaggregated memory in the context of in-memory databases for the first time. We show how to separate metadata and application data by the example of jemalloc, which is widely used for data-intensive applications, and study performance effects for different workloads.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115463309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
29th Reconfigurable Architectures Workshop (RAW 2022) 第29届可重构架构研讨会(RAW 2022)
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00018
J. Becker, Lana Josipović, V. Prasanna, M. Santambrogio, R. Vaidyanathan
{"title":"29th Reconfigurable Architectures Workshop (RAW 2022)","authors":"J. Becker, Lana Josipović, V. Prasanna, M. Santambrogio, R. Vaidyanathan","doi":"10.1109/IPDPSW55747.2022.00018","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00018","url":null,"abstract":"This book presents the proceedings of the 29th Reconfigurable Architectures Workshop (RAW 2022) held in Lyon in May 2022. RAW 2022 is associated with the 36th Annual International Parallel & Distributed Processing Symposium (IPDPS 2022) and is sponsored by the IEEE Computer Society's Technical Committee on Parallel Processing. The workshop is one of the major meetings for researchers to present ideas, results, and ongoing research on both theoretical and practical advances in reconfigurable computing.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130868947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantifying Composable Data Center Utilization 量化可组合数据中心利用率
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00206
M. Taubenblatt, A. Tantawi
{"title":"Quantifying Composable Data Center Utilization","authors":"M. Taubenblatt, A. Tantawi","doi":"10.1109/IPDPSW55747.2022.00206","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00206","url":null,"abstract":"Composable data centers have been a strong topic of research interest, with potential benefits including higher resource utilization, faster and independent refresh cycles, flex-ible/optimum resource allocation and better capital and oper-ational costs. In this paper we use a discrete event simulation model to provide a quantitative assessment of the utilization ben-efits for a composable system versus conventional. We consider a simple multi-server loss queue situation as well as two more realistic scenarios: a mixed memory size demand and a mixed ratio CPU-GPU demand. In all these scenarios, composable systems can provide many tens of percent utilization advantage over more conventional hardware deployments. We also explore the impact of size of the composable resource pool on utilization.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128458768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信