2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)最新文献_第9页

Practice and Experience in using Parallel and Scalable Machine Learning with Heterogenous Modular Supercomputing Architectures 在异构模块化超级计算架构中使用并行和可扩展机器学习的实践和经验

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00019

M. Riedel, Rocco Sedona, C. Barakat, Pétur Helgi Einarsson, R. Hassanian, Gabriele Cavallaro, Matthias Book, Helmut Neukirchen, A. Lintermann

{"title":"Practice and Experience in using Parallel and Scalable Machine Learning with Heterogenous Modular Supercomputing Architectures","authors":"M. Riedel, Rocco Sedona, C. Barakat, Pétur Helgi Einarsson, R. Hassanian, Gabriele Cavallaro, Matthias Book, Helmut Neukirchen, A. Lintermann","doi":"10.1109/IPDPSW52791.2021.00019","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00019","url":null,"abstract":"We observe a continuously increased use of Deep Learning (DL) as a specific type of Machine Learning (ML) for data-intensive problems (i.e., ’big data’) that requires powerful computing resources with equally increasing performance. Consequently, innovative heterogeneous High-Performance Computing (HPC) systems based on multi-core CPUs and many-core GPUs require an architectural design that addresses end user communities’ requirements that take advantage of ML and DL. Still the workloads of end user communities of the simulation sciences (e.g., using numerical methods based on known physical laws) needs to be equally supported in those architectures. This paper offers insights into the Modular Supercomputer Architecture (MSA) developed in the Dynamic Exascale Entry Platform (DEEP) series of projects to address the requirements of both simulation sciences and data-intensive sciences such as High Performance Data Analytics (HPDA). It shares insights into implementing the MSA in the Jülich Supercomputing Centre (JSC) hosting Europe No. 1 Supercomputer Jülich Wizard for European Leadership Science (JUWELS). We augment the technical findings with experience and lessons learned from two application communities case studies (i.e., remote sensing and health sciences) using the MSA with JUWELS and the DEEP systems in practice. Thus, the paper provides details into specific MSA design elements that enable significant performance improvements of ML and DL algorithms. While this paper focuses on MSA-based HPC systems and application experience, we are not losing sight of advances in Cloud Computing (CC) and Quantum Computing (QC) relevant for ML and DL.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127469879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Improving Cryptanalytic Applications with Stochastic Runtimes on GPUs 改进gpu上随机运行时的密码分析应用

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00077

Lena Oden, J. Keller

引用次数: 3

EduPar Virtual Poster Session EduPar虚拟海报会话

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00060

Jesús Cámara, José-Carlos Cano, J. Cuenca, Toshiyuki Maeda, Mariano Saura-Sánchez, Lewis Tseng, A. Wakatani, Martina Barnas

引用次数: 0

Teaching Complex Scheduling Algorithms 复杂调度算法教学

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00058

S. Hunold, Bartłomiej Przybylski

引用次数: 0

GYAN: Accelerating Bioinformatics Tools in Galaxy with GPU-Aware Computation Mapping GYAN:利用gpu感知计算映射加速银河系生物信息学工具

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00037

Gulsum Gudukbay, J. Gunasekaran, Yilin Feng, M. Kandemir, A. Nekrutenko, C. Das, P. Medvedev, B. Grüning, Nate Coraor, Nathan P Roach, E. Afgan

{"title":"GYAN: Accelerating Bioinformatics Tools in Galaxy with GPU-Aware Computation Mapping","authors":"Gulsum Gudukbay, J. Gunasekaran, Yilin Feng, M. Kandemir, A. Nekrutenko, C. Das, P. Medvedev, B. Grüning, Nate Coraor, Nathan P Roach, E. Afgan","doi":"10.1109/IPDPSW52791.2021.00037","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00037","url":null,"abstract":"Galaxy is an open-source web-based framework that is widely used for performing computational analyses in diverse application domains, such as genome assembly, computational chemistry, ecology, and epigenetics, to name a few. The current Galaxy software framework runs on several high-performance computing platforms such as on-premise clusters, public data centers, and national lab supercomputers. These infrastructures also provide support for state-of-the-art accelerators like Graphical Processing Units (GPUs). When coupled with accelerator support, the tools executing in Galaxy can benefit from massive performance gains in terms of computation time, thereby allowing a more robust computational analysis environment for researchers. Despite tools having GPU capabilities, the current Galaxy framework does not support GPUs, and thus prevents tools from taking advantage of the performance benefits offered by GPUs. We present and experimentally evaluate GYAN, a GPU-aware computation mapping and orchestration functionality implemented in Galaxy that allows the Galaxy tools to be executed on a GPU-enabled cluster. GYAN has the capability of identifying GPU-supported tools and scheduling them on single or multiple GPU nodes based on the availability in the cluster. GYAN supports both native and containerized tool execution. We performed extensive evaluations of the implementation using popular bio-engineering tools to demonstrate the benefits of using GPU technologies. For example, the Racon consensus tool executes ~2× faster than the regular baseline CPU-only jobs, while the Bonito base calling tool shows ~50× speedup.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133408309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Scheduling HPC Workflows with Intel Optane Persistent Memory 调度HPC工作流与英特尔Optane持久内存

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00017

R. Venkatesh, Tony Mason, Pradeep R. Fernando, G. Eisenhauer, Ada Gavrilovska

{"title":"Scheduling HPC Workflows with Intel Optane Persistent Memory","authors":"R. Venkatesh, Tony Mason, Pradeep R. Fernando, G. Eisenhauer, Ada Gavrilovska","doi":"10.1109/IPDPSW52791.2021.00017","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00017","url":null,"abstract":"HPC workloads and their Increasing data processing demands have led to using in situ execution, which couples simulation and analytics to reduce cross node memory accesses and their negative impact on overall performance. In situ executions can benefit from new classes of persistent memory technologies, such as Intel® Optane™ DC Persistent Memory (PMEM), which provide a denser, lower cost, and lower performance memory option for server class machines. However, PMEM creates a new set of trade-offs that must be considered to further improve performance for these HPC workloads and to realize the expected benefits. Prior work has only focused on describing how to tune for a single workload component, which may not yield optimal results for the entire workload.In this paper, we use a suite of workflows with different characteristics to understand the impact of using PMEM for in situ workflow executions with respect to different decisions on how PMEM is shared. Based on our experimental observations, we make recommendations for the considerations that must be incorporated for future workflow schedulers to maximize the benefits of the PMEM resource.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"46 3-4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131452741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Exploring a Layer-based Pre-implemented Flow for Mapping CNN on FPGA 探索一种基于层的预实现流程在FPGA上映射CNN

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00025

Danielle Tchuinkou Kwadjo, Joel Mandebi Mbongue, C. Bobda

{"title":"Exploring a Layer-based Pre-implemented Flow for Mapping CNN on FPGA","authors":"Danielle Tchuinkou Kwadjo, Joel Mandebi Mbongue, C. Bobda","doi":"10.1109/IPDPSW52791.2021.00025","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00025","url":null,"abstract":"Convolutional Neural Networks are compute-intensive learning models that have demonstrated ability and effectiveness in solving complex learning problems. However, developing a high-performance FPGA accelerator for CNN often demands high programming skills, hardware verification, precise distribution localization, and long development cycles. Besides, CNN depth increases by reuse and replication of multiple layers. This paper proposes a programming flow for CNN on FPGA to generate high-performance accelerators by assembling CNN pre-implemented components as a puzzle based on the graph topology. Using pre-implemented components allows us to use the minimum of resources necessary, predict the performance, and gain in productivity since there is no need to synthesize any HDL code. Furthermore, components can be reused for a different range of applications. Through prototyping, we demonstrated the viability and relevance of our approach. Experiments show a productivity improvement of up to 69% compared to a traditional FPGA implementation while achieving over 1.75× higher Fmax with lower resources and power consumption.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132205760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

HPS 2021 Keynote Speaker HPS 2021主题演讲

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/ipdpsw52791.2021.00149

引用次数: 0

On Data Parallelism Code Restructuring for HLS Targeting FPGAs HLS目标fpga的数据并行代码重构研究

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00029

Renato Campos, João MP Cardoso

{"title":"On Data Parallelism Code Restructuring for HLS Targeting FPGAs","authors":"Renato Campos, João MP Cardoso","doi":"10.1109/IPDPSW52791.2021.00029","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00029","url":null,"abstract":"FPGAs have emerged as hardware accelerators, and in the last decade, researchers have proposed new languages and frameworks to improve the efficiency when mapping computations to FPGAs. One of the main tasks when considering the mapping of software code to FPGAs is code restructuring. Code restructuring is of paramount importance to achieve efficient FPGA-based accelerators, and its automation continues to be a challenge. This paper describes our recent work on techniques to automatically restructure and annotate C code with directives optimized for HLS targeting FPGAs. The input of our approach consists of an unfolded dataflow graph (DFG), currently obtained by a trace of the program’s execution, and restructured C code with HLS directives as output. Specifically, in this paper we propose algorithms to optimize the input DFGs and use isomorphic graph detection for exposing data-level parallelism. The experimental results show that our approach is able to generate efficient FPGA implementations, with significant speedups over the input unmodified source codes, and very competitive to implementations obtained by manual optimizations and by previous approaches. Furthermore, the experiments show that, using our approach, it is possible to extract data-parallelism in linear to quadratic time with respect to the number of nodes of the input DFG.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114132811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Introduction to SNACS 2021 SNACS 2021简介

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI: 10.1109/ipdpsw52791.2021.00121

引用次数: 0