2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)最新文献_第10页

Evaluating CUDA Portability with HIPCL and DPCT 用HIPCL和DPCT评估CUDA可移植性

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00065

Zheming Jin, J. Vetter

引用次数: 6

A Framework for the Automatic Generation of FPGA-based Near-Data Processing Accelerators in Smart Storage Systems 智能存储系统中基于fpga的近数据处理加速器自动生成框架

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00028

Lukas Weber, Lukas Sommer, Leonardo Solis-Vasquez, Tobias Vinçon, Christian Knödler, Arthur Bernhardt, Ilia Petrov, Andreas Koch

{"title":"A Framework for the Automatic Generation of FPGA-based Near-Data Processing Accelerators in Smart Storage Systems","authors":"Lukas Weber, Lukas Sommer, Leonardo Solis-Vasquez, Tobias Vinçon, Christian Knödler, Arthur Bernhardt, Ilia Petrov, Andreas Koch","doi":"10.1109/IPDPSW52791.2021.00028","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00028","url":null,"abstract":"Near-Data Processing is a promising approach to overcome the limitations of slow I/O interfaces in the quest to analyze the ever-growing amount of data stored in database systems. Next to CPUs, FPGAs will play an important role for the realization of functional units operating close to data stored in non-volatile memories such as Flash.It is essential that the NDP-device understands formats and layouts of the persistent data, to perform operations in-situ. To this end, carefully optimized format parsers and layout accessors are needed. However, designing such FPGA-based Near-Data Processing accelerators requires significant effort and expertise. To make FPGA-based Near-Data Processing accessible to non-FPGA experts, we will present a framework for the automatic generation of FPGA-based accelerators capable of data filtering and transformation for key-value stores based on simple data-format specifications.The evaluation shows that our framework is able to generate accelerators that are almost identical in performance compared to the manually optimized designs of prior work, while requiring little to no FPGA-specific knowledge and additionally providing improved flexibility and more powerful functionality.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"105 10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127407605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

[Copyright notice] (版权)

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/ipdpsw52791.2021.00010

引用次数: 0

Scaling Single-Image Super-Resolution Training on Modern HPC Clusters: Early Experiences 在现代HPC集群上扩展单图像超分辨率训练:早期经验

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00143

Quentin G. Anthony, Lang Xu, H. Subramoni, D. Panda

{"title":"Scaling Single-Image Super-Resolution Training on Modern HPC Clusters: Early Experiences","authors":"Quentin G. Anthony, Lang Xu, H. Subramoni, D. Panda","doi":"10.1109/IPDPSW52791.2021.00143","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00143","url":null,"abstract":"Deep Learning (DL) models for super-resolution (DLSR) are an emerging trend in response to the growth of ML/DL applications requiring high-resolution images. DLSR methods have also shown promise in domains such as medical imaging, surveillance, and microscopy. However, DLSR models are extremely computationally demanding, and require unreasonably long training times on modern Volta GPUs. In our experiments, we observed only 10.3 images/second on a single Volta GPU for training EDSR, a state-of-the-art DLSR model for single-image super-resolution. In comparison, a Volta GPU can process 360 images/second while training ResNet-50, a state-of-the-art model for image classification. Therefore, we believe supercomputers provide a good candidate to speed up DLSR model training. In this paper, we select EDSR as the representative DLSR PyTorch model. Further, we introduce Horovod-based distributed EDSR training. However, we observed poor default EDSR scaling performance on the Lassen HPC system at Lawrence Livermore National Laboratory. To investigate the performance degradations, we perform exhaustive communication profiling. These profiling insights are then used to optimize CUDA-Aware MPI for DLSR models by ensuring advanced MPI designs involving CUDA IPC and registration caching are properly applied by DL frameworks. We present a comprehensive scaling study of EDSR with MVAPICH2-GDR and NCCL up to 512 GPUs on Lassen. We demonstrate an improvement in scaling efficiency by 15.6% over default Horovod training, which translates to a 1.26× speedup in training performance.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130221115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Introduction to AsHES 2021 灰烬2021简介

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/ipdpsw52791.2021.00072

引用次数: 0

Message from the GrAPL 2021 Workshop Chairs 来自GrAPL 2021工作坊椅子的信息

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/ipdpsw52791.2021.00043

引用次数: 0

Dovado: An Open-Source Design Space Exploration Framework Dovado:一个开源的设计空间探索框架

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00027

D. Paletti, Davide Conficconi, M. Santambrogio

{"title":"Dovado: An Open-Source Design Space Exploration Framework","authors":"D. Paletti, Davide Conficconi, M. Santambrogio","doi":"10.1109/IPDPSW52791.2021.00027","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00027","url":null,"abstract":"Traditional hardware development exploits description languages such as VHDL and (System)Verilog to produce highly parametrizable RTL designs. Different parameter values yield different utilization-frequency trade-offs, and hand-tuning is not feasible with a non-trivial amount of parameters. Generally, the Computer-Aided Design (CAD) literature proposes approaches that mainly tackle automatic exploration without combining a design automation feature. Hence, this work proposes Dovado, an open-source CAD tool for design space exploration (DSE) tailored for FPGAs-based designs. Starting from VHDL/(System)Verilog, Dovado exploits Vivado and supports the hardware developer for an exact exploration of a given set of parameters or a DSE where it returns the non-dominated set of configuration points. In this work, we exploit a multi-objective integer formulation and Non-Dominated Sorting Genetic Algorithm (NSGA)-II for a fast DSE. Moreover, we propose an approximation model for the NSGA-II fitness function to decide whether Vivado or a Nadaraya-Watson model should estimate the optimization metrics.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121920151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Data-Intensive Computing Modules for Teaching Parallel and Distributed Computing 并行和分布式计算教学中的数据密集型计算模块

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00062

M. Gowanlock, Benoît Gallet

{"title":"Data-Intensive Computing Modules for Teaching Parallel and Distributed Computing","authors":"M. Gowanlock, Benoît Gallet","doi":"10.1109/IPDPSW52791.2021.00062","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00062","url":null,"abstract":"Parallel and distributed computing (PDC) has found a broad audience that exceeds the traditional fields of computer science. This is largely due to the increasing computational demands of many engineering and domain science research objectives. Thus, there is a demonstrated need to train students with and without computer science backgrounds in core PDC concepts. Given the rise of data science and other data-enabled computational fields, we propose several data-intensive pedagogic modules that are used to teach PDC using message-passing programming with the Message Passing Interface (MPI). These modules employ activities that are common in database systems and scientific workflows that are likely to be employed by domain scientists. Our hypothesis is that using application-driven pedagogic materials facilitates student learning by providing the context needed to fully appreciate the goals of the activities.We evaluated the efficacy of using the data-intensive pedagogic modules to teach core PDC concepts using a sample of graduate students enrolled in a high performance computing course at Northern Arizona University. In the sample, only 30% of students have a traditional computer science background. We found that the hands-on application-driven approach was generally successful at helping students learn core PDC concepts.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124336263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

PIGO: A Parallel Graph Input/Output Library 一个并行图形输入/输出库

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00050

Kasimir Gabert, Ümit V. Çatalyürek

引用次数: 4

ScaDL 2021 Invited Speaker-4 ScaDL 2021特邀演讲嘉宾

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/ipdpsw52791.2021.00139

引用次数: 0