2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)最新文献_第6页

“Crosscutting Themes in Computer Science: Where Does PDC Education Fit?” 计算机科学的横切主题:PDC教育适合哪里?

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00063

R. Raj

引用次数: 0

Throughput-oriented and Accuracy-aware DNN Training with BFloat16 on GPU 基于GPU的BFloat16的吞吐量导向和精度感知DNN训练

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00176

Zhen Xie, Siddhisanket Raskar, M. Emani

{"title":"Throughput-oriented and Accuracy-aware DNN Training with BFloat16 on GPU","authors":"Zhen Xie, Siddhisanket Raskar, M. Emani","doi":"10.1109/IPDPSW55747.2022.00176","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00176","url":null,"abstract":"Deep Neural Networks (DNNs) have transformed the field of artificial intelligence and achieved extraordinary success in many areas. The training of DNNs is commonly compute and memory-intensive, which has resulted in several optimizations in the training phase. Among them, reduced precision is a typical and widely used technique to accelerate DNN training and reduce memory requirements. However, applying a widely adopted reduced precision format such as Float16 to all involved operations in DNN training is not optimal as the use of Float16 in some operations can hurt model accuracy. Meanwhile, additional optimizations including loss scaling and autocast techniques can mitigate the accuracy loss but lead to inherent overhead and inadequate use of reduced precision. In this work, we leverage another reduced precision format, BFloat16, and introduce a throughput-oriented and accuracy-aware approach to maximize the performance potential of DNN training. Since the high throughput provided by BFloat16 format is accompanied by low precision of the floating-point representation, this approach achieves high throughput by using BFloat16 on all DNN op-erations and avoids the accuracy loss through a customized accuracy-aware normalization. Results show that our approach outperforms the state-of-the-art mixed precision training by 1.21x on an NVIDIA A100 GPU.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123294753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Towards Java-based HPC using the MVAPICH2 Library: Early Experiences 使用MVAPICH2库实现基于java的高性能计算:早期经验

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00091

Kinan Al-Attar, A. Shafi, H. Subramoni, D. Panda

{"title":"Towards Java-based HPC using the MVAPICH2 Library: Early Experiences","authors":"Kinan Al-Attar, A. Shafi, H. Subramoni, D. Panda","doi":"10.1109/IPDPSW55747.2022.00091","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00091","url":null,"abstract":"There has been sporadic interest in using Java for High Performance Computing (HPC) in the past. These earlier efforts have resulted in several Java Message Passing Interface (MPI) [1] libraries including mpiJava [2], FastMPJ [3], MPJ Express [4], and Java Open MPI [5]. In this paper, we present our efforts in designing and implementing Java bindings for the MVAPICH2 [6] library. The MVAPICH2 Java bindings (MVAPICH2-J) follow the same API as the Java Open MPI library. MVAPICH2-J also provides support for communicating direct New I/O (NIO) ByteBuffers and Java arrays. Direct ByteBuffers reside outside JVM heaps and are not subject to the garbage collection. The library implements and utilizes a buffering layer to explicitly manage memory to avoid creating buffers every time a Java array message is communicated. In order to evaluate the performance of MVAPICH2-J and other Java MPI libraries, we also designed and implemented OMB-J that is a Java extension to the popular OSU Micro-Benchmarks suite (OMB) [7]. OMB-J currently supports a range of bench-marks for evaluating point-to-point and collective communication primitives. We also added support for communicating direct ByteBuffers and Java arrays. Our evaluations reveal that at the OMB-J level, ByteBuffers are superior in performance due to the elimination of extra copying between the Java and the Java Native Interface (JNI) layer. MVAPICH2-J achieves similar performance to Java Open MPI for ByteBuffers in point-to-point communication primitives that is evaluated using latency and bandwidth benchmarks. For Java arrays, there is a slight overhead for MVAPICH2-J due to the use of the buffering layer. For the collective communication benchmarks, we observe good performance for MVAPICH2-J. Where, MVAPICH2-J fairs better than Java Open MPI with ByteBuffers by $a$ factor of 6.2 and 2.76 for broadcast and all reduce, respectively, on average for all messages sizes. And, using Java arrays, $2. 2times$ and $1. 62times$ on average for broadcast and allreduce, respectively. The collective communication performance is dictated by the performance of the respective native MPI libraries.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123480112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Fully Dynamic Line Maintenance by Hybrid Programmable Matter 全动态线路维护的混合可编程物质

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00087

Nooshin Nokhanji, P. Flocchini, N. Santoro

引用次数: 0

Reproducibility of Bioinformatics Tools 生物信息学工具的可重复性

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00046

P. Baykal, N. Beerenwinkel, S. Mangul

引用次数: 0

IEEE Workshop on Parallel / Distributed Combinatorics and Optimization (PDSC) IEEE并行/分布式组合与优化(PDSC)研讨会

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00127

引用次数: 0

Combining Uncore Frequency and Dynamic Power Capping to Improve Power Savings 结合非核心频率和动态功率封顶，提高节能效果

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00164

Amina Guermouche

{"title":"Combining Uncore Frequency and Dynamic Power Capping to Improve Power Savings","authors":"Amina Guermouche","doi":"10.1109/IPDPSW55747.2022.00164","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00164","url":null,"abstract":"The US Department of Energy sets a limit of 20 to 30 MW for future exascale machines. In order to control their power consumption, modern processors provide many features. Power capping and uncore frequency scaling are examples of such features which allow to limit the power consumed by a processor. In this paper, we propose to combine dynamic power capping to uncore frequency scaling. We propose DUFP, an extension of DUF, an existing tool which dynamically adapts uncore frequency. DUFP dynamically adapts the processor power cap to the application needs. Finally, just like DUF, DUFP can tolerate performance loss up to a user-defined limit. With a controlled impact on performance, DUFP is able to provide power savings with no energy consumption increase. The evaluation of DUFP shows that it manages to stay within the user-defined slowdown limits for most of the studied applications. Moreover, combining uncore frequency scaling to power capping: (i) improves power consumption by up to 13.98 % with additional energy savings for applications where uncore frequency scaling has a limited impact, (ii) improves power consumption by up to 7.90 % compared to using uncore frequency scaling by itself and (iii) leads to more than 5 % power savings at 5 % tolerated slowdown with no energy consumption increase for most applications.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122334869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluating Unified Memory Performance in HIP 在HIP中评估统一记忆体的效能

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00096

Zheming Jin, J. Vetter

引用次数: 1

Accelerated LD-based selective sweep detection using GPUs and FPGAs 使用gpu和fpga加速基于lcd的选择性扫描检测

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00044

Reinout Corts, Niek Sterenborg, Nikolaos S. Alachiotis

{"title":"Accelerated LD-based selective sweep detection using GPUs and FPGAs","authors":"Reinout Corts, Niek Sterenborg, Nikolaos S. Alachiotis","doi":"10.1109/IPDPSW55747.2022.00044","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00044","url":null,"abstract":"Selective sweep detection carries theoretical significance and has several practical implications, from explaining the adaptive evolution of a species in an environment to understanding the emergence of viruses from animals, such as SARS-CoV-2, and their transmission from human to human. The plethora of available genomic data for population genetic analyses, however, poses various computational challenges to existing methods and tools, leading to prohibitively long analysis times. In this work, we accelerate LD (Linkage Disequilibrium) - based selective sweep detection using GPUs and FPGAs on personal computers and datacenter infrastructures. LD has been previously efficiently accelerated with both GPUs and FPGAs. However, LD alone cannot serve as an indicator of selective sweeps. Here, we complement previous research with dedicated accelerators for the ω statistic, which is a direct indicator of a selective sweep. We evaluate performance of our accelerator solutions for computing the $w$ statistic and for a complete sweep detection method, as implemented by the open-source software OmegaPlus. In comparison with a single CPU core, the FPGA accelerator delivers up to 57.1× and 61.7× faster computation of the ω statistic and the complete sweep detection analysis, respectively. The respective attained speedups by the GPU-accelerated version of OmegaPlus are 2.9× and 12.9×. The GPU-accelerated implementation is available for download here: https://github.com/MrKzn/omegaplus.git.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"188 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134059060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dynamic Batch Parallel Algorithms for Updating PageRank 动态批处理并行算法更新PageRank

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00186

Subhajit Sahu, Kishore Kothapalli, D. Banerjee

引用次数: 0