2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)最新文献_第3页

Designing Effective Sparse Expert Models 设计有效的稀疏专家模型

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00171

Barret Zoph, Irwan Bello, Sameer Kumar, Nan Du, Yanping Huang, J. Dean, Noam M. Shazeer, W. Fedus

引用次数: 58

Arm meets Cloud: A Case Study of MPI Library Performance on AWS Arm-based HPC Cloud with Elastic Fabric Adapter Arm遇上云:基于AWS Arm的高性能计算云上MPI库性能的案例研究

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00083

Shulei Xu, A. Shafi, H. Subramoni, D. Panda

{"title":"Arm meets Cloud: A Case Study of MPI Library Performance on AWS Arm-based HPC Cloud with Elastic Fabric Adapter","authors":"Shulei Xu, A. Shafi, H. Subramoni, D. Panda","doi":"10.1109/IPDPSW55747.2022.00083","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00083","url":null,"abstract":"Recent advances in HPC Cloud field has made multi-core high performance VM services more accessible. Emerging Arm based HPC systems are also receiving more attention. Amazon Web Service recently announced new c6gn instances with Gravition 2 Arm CPU on each node and support of Elastic Fabric Adapter, which make them the leading high performance Arm-based cloud system vendor. In this paper, we characterize the performance and capability of the AWS Arm architecture. We explore the performance optimization of current MPI libraries based on features of Arm-based cloud systems and Scalable Reliable Datagram protocol of Elastic Fabric Adapter and evaluate the impact of our optimization of high-performance MPI libraries. Our study shows that the performance optimization for MPI library on AWS Arm systems significantly improves the performance of MPI communication for both benchmark and application level. We gain up to 86% performance improvement in micro-benchmark level col-lective communication operations and up to 9% improvement in Weather Research and Forecasting application level. This paper provides a comprehensive performance evaluation for several popular MPI libraries on AWS Arm-based Cloud systems with EFA support. HPC application developers and users are able to get insights from our study to achieve better performance of their applications on Arm-based cloud systems with EFA support.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117042886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

RAW 2022 Keynote Speaker 1: Using FPGAs in datacenters and the cloud RAW 2022主题演讲者1:在数据中心和云中使用fpga

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00020

G. Alonso

引用次数: 0

Teaching Heterogeneous Computing Using DPC++ 基于dpc++的异构计算教学

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00069

J. Fuentes, Daniel López, Sebastián González

引用次数: 4

An Architecture- Independent CGRA Compiler enabling OpenMP Applications 一个支持OpenMP应用程序的体系结构独立的CGRA编译器

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00112

Takuya Kojima, B. Adhi, Carlos Cortes, Y. Tan, K. Sano

{"title":"An Architecture- Independent CGRA Compiler enabling OpenMP Applications","authors":"Takuya Kojima, B. Adhi, Carlos Cortes, Y. Tan, K. Sano","doi":"10.1109/IPDPSW55747.2022.00112","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00112","url":null,"abstract":"Coarse-Grained reconfigurable architecture (CGRA) is a promising platform for HPC systems in the post-Moore's era. A single-source programming model is essential for practical heterogeneous computing. However, we do not have a canonical programming model and a frontend compiler for it. Existing versatile CGRAs, in respect to their execution model, computational capability, and system structure, magnify the difficulty of orchestrating the compiler techniques. It consequently forces designers of the CGRAs to develop the compiler from scratch, working only for their architectures. Such an approach is outdated, given other successful accelerators like GPU and FPGAs. This paper presents a new CGRA compiler framework in order to reduce development efforts of CG RA applications. OpenMP annotated codes are fed into the proposed compiler, as recent OpenMP support device offloading to the accelerators. This property improves the reusability of the existing source code for HPC workloads. The design of the compiler is inspired by LLVM, which is the most famous compiler framework so that the frontend is built to be architecture-independent. In this work, we demonstrate that the proposed compiler can handle different types of CG RAs without changing the source codes. In addition, we discuss the effect of architecture-independent optimization algorithms. We also provide an open-source implementation of the compiler framework at https://github.com/hal-lab-u-tokyo/CGRAOmp.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"257 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115456396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Optimal Triangulation on the High Bandwidth Memory Model 高带宽内存模型的最优三角剖分

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00089

K. Nakano, V. Poupet

{"title":"Optimal Triangulation on the High Bandwidth Memory Model","authors":"K. Nakano, V. Poupet","doi":"10.1109/IPDPSW55747.2022.00089","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00089","url":null,"abstract":"The High Bandwidth Memory (HBM) model is a theoretical computing model consisting of a logic circuit with a large external memory. Each address of the external memory can store $p$ elements which can be read or written at the same time. Access to $p$ elements stored at a given address in the external memory has a latency of $l$ clock cycles. However, access to any $k$ consecutive addresses can be done only in $(k+l-1)$ clock cycles in a pipeline fashion by burst mode. A hardware algorithm is implemented in a logic circuit of the HBM to solve a particular problem. In this paper, we present an optimal implementation of the $O(n^{3})$ -time dynamic programming algorithm for solving the optimal polygon triangulation (OPT) problem which is a problem to find a triangulation with minimum total weight of an input convex n-gon with weighted cords. We assume that the input weight matrix of a convex n-gon is stored in the external memory of the HBM model. Our hardware algorithm implemented in the logic circuit of size $O(s^{2})$ operates on it and computes the optimal polygon triangulation of the input polygon in $O(frac{n^{3}}{sp}+frac{n^{3}}{s^{2}}+frac{n^{3}}{s^{3}}l)$ time. We also provide a theoretical proof showing that any hardware algorithm in a logic circuit of size $O(s^{2})$ takes at least $Omega(frac{n^{3}}{sp}+frac{n^{3}}{s^{2}})$ time to solve the OPT problem. Thus, our implementation is optimal whenever $s^{2}geq lp$ or $sgeq l$, and this optimality condition is always satisfied from a practical point of view.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115015687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The First International Workshop on COmputing using EmeRging EXotic AI-Inspired Systems (CORtEX'22) 第一届使用新兴的外来人工智能启发系统计算的国际研讨会(CORtEX'22)

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00212

引用次数: 0

CORtEX 2022 Invited Speaker 3: Neuromorphic computing: from modelling the brain to bio-inspired AI 皮层2022特邀演讲者3:神经形态计算:从大脑建模到仿生人工智能

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00215

Oliver Rhodes

引用次数: 0

A SHA-512 Hardware Implementation Based on Block RAM Storage Structure 基于块RAM存储结构的SHA-512硬件实现

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00031

Mingyuan Yang, Yemeng Zhang, Bohan Yang, Hanning Wang, S. Yin, Shaojun Wei, Leibo Liu

引用次数: 0

HiCOMB 2022 Invited Speaker: Pandemic-scale Phylogenetics HiCOMB 2022特邀演讲者:大流行规模的系统发育

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00035

Yatish Turakhia

{"title":"HiCOMB 2022 Invited Speaker: Pandemic-scale Phylogenetics","authors":"Yatish Turakhia","doi":"10.1109/IPDPSW55747.2022.00035","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00035","url":null,"abstract":"Phylogenetics has been central to the genomic surveillance, epidemiology and contact tracing efforts during the COVD-19 pandemic. But the massive scale of genomic sequencing has rendered the pre-pandemic tools quite inadequate for comprehensive phylogenetic analyses. In this talk, I will discuss a high-performance computing (HPC) phylogenetic package that we developed to address the needs imposed by this pandemic. Orders of magnitude gains were achieved by this package through several domain-specific optimization and parallelization techniques. The package comprises four programs: UShER, matOptimize, RIPPLES and matUtils. Using high-performance computing, UShER and matOptimize maintain and refine daily a massive mutation-annotated phylogenetic tree consisting of all (>9M currently) SARSCoV-2 sequences available on online repositories. With UShER and RIPPLES, individual labs - even with modest compute resources - incorporate newly-sequenced SARS-CoV-2 genomes on this phylogeny and discover evidence for recombination in real-time. With matUtils, they rapidly query and visualize massive SARS-CoV-2 phylogenies. This has empowered scientists worldwide to study the SARS-CoV-2 evolutionary and transmission dynamics at an unprecedented scale, resolution and speed. This has laid the groundwork for future genomic surveillance of MOST infectious pathogens.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"207 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121454278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0