2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)最新文献

筛选
英文 中文
“Crosscutting Themes in Computer Science: Where Does PDC Education Fit?” 计算机科学的横切主题:PDC教育适合哪里?
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00063
R. Raj
{"title":"“Crosscutting Themes in Computer Science: Where Does PDC Education Fit?”","authors":"R. Raj","doi":"10.1109/IPDPSW55747.2022.00063","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00063","url":null,"abstract":"Since 1968, ACM and IEEE Computer Society have jointly led the development of curricular guidelines in various computing disciplines, starting with Computer Science (CS). The last major release of the undergraduate CS Curriculum Guidelines (CS2013) recognized 18 knowledge areas underpinning the discipline; the next decennial release is also likely to have the same number of knowledge areas. Viewing these knowledge areas as distinct silos does disservice to their interconnected nature, especially as crosscutting or recurring themes run across them and help to unify fundamental concepts in the CS discipline. In this talk, I will discuss crosscutting themes as providing an orthogonal view of the CS discipline, a view girded by knowledge and experience gained over the past 50 years. Providing explicit instruction in the presence and variety of crosscutting themes in CS will help students see each area not just in a silo of insular ideas, but also as part of the ethos of the discipline. I will use examples from the different knowledge areas to show where Parallel and Distributed Computing could fit into CS.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122734933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Throughput-oriented and Accuracy-aware DNN Training with BFloat16 on GPU 基于GPU的BFloat16的吞吐量导向和精度感知DNN训练
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00176
Zhen Xie, Siddhisanket Raskar, M. Emani
{"title":"Throughput-oriented and Accuracy-aware DNN Training with BFloat16 on GPU","authors":"Zhen Xie, Siddhisanket Raskar, M. Emani","doi":"10.1109/IPDPSW55747.2022.00176","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00176","url":null,"abstract":"Deep Neural Networks (DNNs) have transformed the field of artificial intelligence and achieved extraordinary success in many areas. The training of DNNs is commonly compute and memory-intensive, which has resulted in several optimizations in the training phase. Among them, reduced precision is a typical and widely used technique to accelerate DNN training and reduce memory requirements. However, applying a widely adopted reduced precision format such as Float16 to all involved operations in DNN training is not optimal as the use of Float16 in some operations can hurt model accuracy. Meanwhile, additional optimizations including loss scaling and autocast techniques can mitigate the accuracy loss but lead to inherent overhead and inadequate use of reduced precision. In this work, we leverage another reduced precision format, BFloat16, and introduce a throughput-oriented and accuracy-aware approach to maximize the performance potential of DNN training. Since the high throughput provided by BFloat16 format is accompanied by low precision of the floating-point representation, this approach achieves high throughput by using BFloat16 on all DNN op-erations and avoids the accuracy loss through a customized accuracy-aware normalization. Results show that our approach outperforms the state-of-the-art mixed precision training by 1.21x on an NVIDIA A100 GPU.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123294753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Towards Java-based HPC using the MVAPICH2 Library: Early Experiences 使用MVAPICH2库实现基于java的高性能计算:早期经验
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00091
Kinan Al-Attar, A. Shafi, H. Subramoni, D. Panda
{"title":"Towards Java-based HPC using the MVAPICH2 Library: Early Experiences","authors":"Kinan Al-Attar, A. Shafi, H. Subramoni, D. Panda","doi":"10.1109/IPDPSW55747.2022.00091","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00091","url":null,"abstract":"There has been sporadic interest in using Java for High Performance Computing (HPC) in the past. These earlier efforts have resulted in several Java Message Passing Interface (MPI) [1] libraries including mpiJava [2], FastMPJ [3], MPJ Express [4], and Java Open MPI [5]. In this paper, we present our efforts in designing and implementing Java bindings for the MVAPICH2 [6] library. The MVAPICH2 Java bindings (MVAPICH2-J) follow the same API as the Java Open MPI library. MVAPICH2-J also provides support for communicating direct New I/O (NIO) ByteBuffers and Java arrays. Direct ByteBuffers reside outside JVM heaps and are not subject to the garbage collection. The library implements and utilizes a buffering layer to explicitly manage memory to avoid creating buffers every time a Java array message is communicated. In order to evaluate the performance of MVAPICH2-J and other Java MPI libraries, we also designed and implemented OMB-J that is a Java extension to the popular OSU Micro-Benchmarks suite (OMB) [7]. OMB-J currently supports a range of bench-marks for evaluating point-to-point and collective communication primitives. We also added support for communicating direct ByteBuffers and Java arrays. Our evaluations reveal that at the OMB-J level, ByteBuffers are superior in performance due to the elimination of extra copying between the Java and the Java Native Interface (JNI) layer. MVAPICH2-J achieves similar performance to Java Open MPI for ByteBuffers in point-to-point communication primitives that is evaluated using latency and bandwidth benchmarks. For Java arrays, there is a slight overhead for MVAPICH2-J due to the use of the buffering layer. For the collective communication benchmarks, we observe good performance for MVAPICH2-J. Where, MVAPICH2-J fairs better than Java Open MPI with ByteBuffers by $a$ factor of 6.2 and 2.76 for broadcast and all reduce, respectively, on average for all messages sizes. And, using Java arrays, $2. 2times$ and $1. 62times$ on average for broadcast and allreduce, respectively. The collective communication performance is dictated by the performance of the respective native MPI libraries.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123480112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Fully Dynamic Line Maintenance by Hybrid Programmable Matter 全动态线路维护的混合可编程物质
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00087
Nooshin Nokhanji, P. Flocchini, N. Santoro
{"title":"Fully Dynamic Line Maintenance by Hybrid Programmable Matter","authors":"Nooshin Nokhanji, P. Flocchini, N. Santoro","doi":"10.1109/IPDPSW55747.2022.00087","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00087","url":null,"abstract":"Motivated by the manipulation of nanoscale materials, recent investigations have focused on hybrid systems where passive elements incapable of movement, called tiles, are manipulated by one or more mobile entities, called robots, with limited computational capabilities. Like in most self-organizing systems, the fundamental concern is with the (geometric) shapes created by the position of the tiles; among them, the line is perhaps the most important. The existing investigations have focused on formation of the shape, but not on its reconfiguration following the failure of some of the tiles. In this paper, we study the problem of maintaining a line formation in presence of fully dynamic failures: any tile can stop functioning at any time. We show how this problem can be solved by a group of very simple robots, with the computational power of deterministic finite automata.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125559924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reproducibility of Bioinformatics Tools 生物信息学工具的可重复性
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00046
P. Baykal, N. Beerenwinkel, S. Mangul
{"title":"Reproducibility of Bioinformatics Tools","authors":"P. Baykal, N. Beerenwinkel, S. Mangul","doi":"10.1109/IPDPSW55747.2022.00046","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00046","url":null,"abstract":"We introduce a fast and scalable method to assess reproducibility of bioinformatics tools. We replace replicates which are cause of data variation by synthetic replicates. To assess reproducibility of bioinformatics tools, we run the tools with two different types of synthetic replicates and compare results obtained from the original data. Results show differences between output obtained from original data and synthetic replicates.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131955332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Workshop on Parallel / Distributed Combinatorics and Optimization (PDSC) IEEE并行/分布式组合与优化(PDSC)研讨会
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00127
{"title":"IEEE Workshop on Parallel / Distributed Combinatorics and Optimization (PDSC)","authors":"","doi":"10.1109/IPDPSW55747.2022.00127","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00127","url":null,"abstract":"","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130528960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combining Uncore Frequency and Dynamic Power Capping to Improve Power Savings 结合非核心频率和动态功率封顶,提高节能效果
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00164
Amina Guermouche
{"title":"Combining Uncore Frequency and Dynamic Power Capping to Improve Power Savings","authors":"Amina Guermouche","doi":"10.1109/IPDPSW55747.2022.00164","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00164","url":null,"abstract":"The US Department of Energy sets a limit of 20 to 30 MW for future exascale machines. In order to control their power consumption, modern processors provide many features. Power capping and uncore frequency scaling are examples of such features which allow to limit the power consumed by a processor. In this paper, we propose to combine dynamic power capping to uncore frequency scaling. We propose DUFP, an extension of DUF, an existing tool which dynamically adapts uncore frequency. DUFP dynamically adapts the processor power cap to the application needs. Finally, just like DUF, DUFP can tolerate performance loss up to a user-defined limit. With a controlled impact on performance, DUFP is able to provide power savings with no energy consumption increase. The evaluation of DUFP shows that it manages to stay within the user-defined slowdown limits for most of the studied applications. Moreover, combining uncore frequency scaling to power capping: (i) improves power consumption by up to 13.98 % with additional energy savings for applications where uncore frequency scaling has a limited impact, (ii) improves power consumption by up to 7.90 % compared to using uncore frequency scaling by itself and (iii) leads to more than 5 % power savings at 5 % tolerated slowdown with no energy consumption increase for most applications.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122334869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating Unified Memory Performance in HIP 在HIP中评估统一记忆体的效能
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00096
Zheming Jin, J. Vetter
{"title":"Evaluating Unified Memory Performance in HIP","authors":"Zheming Jin, J. Vetter","doi":"10.1109/IPDPSW55747.2022.00096","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00096","url":null,"abstract":"Heterogeneous unified memory management between a CPU and a GPU is a major challenge in GPU computing. Recently, unified memory (UM) has been supported by software and hardware components on AMD computing platforms. The support could simplify the complexities of memory management. In this paper, we attempt to have a better understanding of UM by evaluating the performance of UM programs on an AMD MI100 GPU. More specifically, we evaluate data migration using UM against other data transfer techniques for the overall performance of an application, assess the impacts of three commonly used optimization techniques on the kernel execution time of a vector add sample, and compare the performance and productivity of selected benchmarks with and without UM. The performance overhead associated with UM is not trivial, but it can improve programming productivity by reducing lines of code for scientific applications. We aim to present early results and feedback on the UM performance to the vendor.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133866421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Accelerated LD-based selective sweep detection using GPUs and FPGAs 使用gpu和fpga加速基于lcd的选择性扫描检测
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00044
Reinout Corts, Niek Sterenborg, Nikolaos S. Alachiotis
{"title":"Accelerated LD-based selective sweep detection using GPUs and FPGAs","authors":"Reinout Corts, Niek Sterenborg, Nikolaos S. Alachiotis","doi":"10.1109/IPDPSW55747.2022.00044","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00044","url":null,"abstract":"Selective sweep detection carries theoretical significance and has several practical implications, from explaining the adaptive evolution of a species in an environment to understanding the emergence of viruses from animals, such as SARS-CoV-2, and their transmission from human to human. The plethora of available genomic data for population genetic analyses, however, poses various computational challenges to existing methods and tools, leading to prohibitively long analysis times. In this work, we accelerate LD (Linkage Disequilibrium) - based selective sweep detection using GPUs and FPGAs on personal computers and datacenter infrastructures. LD has been previously efficiently accelerated with both GPUs and FPGAs. However, LD alone cannot serve as an indicator of selective sweeps. Here, we complement previous research with dedicated accelerators for the ω statistic, which is a direct indicator of a selective sweep. We evaluate performance of our accelerator solutions for computing the $w$ statistic and for a complete sweep detection method, as implemented by the open-source software OmegaPlus. In comparison with a single CPU core, the FPGA accelerator delivers up to 57.1× and 61.7× faster computation of the ω statistic and the complete sweep detection analysis, respectively. The respective attained speedups by the GPU-accelerated version of OmegaPlus are 2.9× and 12.9×. The GPU-accelerated implementation is available for download here: https://github.com/MrKzn/omegaplus.git.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"188 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134059060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Batch Parallel Algorithms for Updating PageRank 动态批处理并行算法更新PageRank
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI: 10.1109/IPDPSW55747.2022.00186
Subhajit Sahu, Kishore Kothapalli, D. Banerjee
{"title":"Dynamic Batch Parallel Algorithms for Updating PageRank","authors":"Subhajit Sahu, Kishore Kothapalli, D. Banerjee","doi":"10.1109/IPDPSW55747.2022.00186","DOIUrl":"https://doi.org/10.1109/IPDPSW55747.2022.00186","url":null,"abstract":"The design and implementation of parallel algorithms for dynamic graph problems is attracting significant research attention in the recent years, driven by numerous applications to social network analysis, neuroscience, and protein interaction networks. One such problem is the computation of PageRank values of vertices in a directed graph. This paper presents two new parallel algorithms for recomputing the PageRank values of vertices in a dynamic graph. Our techniques require the recomputation of the PageRank of only the vertices affected by the insertion/deletion of a batch of edges. We conduct detailed experimental studies of our algorithm on a set of 11 real-world graphs. Our results on Intel Xeon Silver 4116 CPU and NVIDIA Tesla V100 PCIe 16GB GPU indicate that our algorithms outperform static and dynamic update algorithms by $6.1times$: and $8.6times mathbf{on}$ the CPU, and by 9.8×and $9.3timesmathbf{on}$ the GPU respectively. We also compare the performance of the algorithms in batched mode to cumulative single-edge updates.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132045973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信