2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)最新文献_第3页

IRIS-BLAS: Towards a Performance Portable and Heterogeneous BLAS Library IRIS-BLAS:迈向性能可移植和异构的BLAS库

2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2022-12-01 DOI: 10.1109/HiPC56025.2022.00042

Narasinga Rao Miniskar, Mohammad Alaul Haque Monil, Pedro Valero-Lara, Frank Liu, J. Vetter

引用次数: 3

Parallel Vertex Color Update on Large Dynamic Networks 大型动态网络的并行顶点颜色更新

2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2022-12-01 DOI: 10.1109/HiPC56025.2022.00027

A. Khanda, S. Bhowmick, Xin Liang, Sajal K. Das

引用次数: 1

AccDP: Accelerated Data-Parallel Distributed DNN Training for Modern GPU-Based HPC Clusters AccDP:加速数据并行分布式DNN训练现代基于gpu的HPC集群

2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2022-12-01 DOI: 10.1109/HiPC56025.2022.00017

Nawras Alnaasan, Arpan Jain, A. Shafi, H. Subramoni, D. Panda

{"title":"AccDP: Accelerated Data-Parallel Distributed DNN Training for Modern GPU-Based HPC Clusters","authors":"Nawras Alnaasan, Arpan Jain, A. Shafi, H. Subramoni, D. Panda","doi":"10.1109/HiPC56025.2022.00017","DOIUrl":"https://doi.org/10.1109/HiPC56025.2022.00017","url":null,"abstract":"Deep Learning (DL) has become a prominent machine learning technique due to the availability of efficient computational resources in the form of Graphics Processing Units (GPUs), large-scale datasets and a variety of models. The newer generation of GPUs are being designed with special emphasis on optimizing performance for DL applications. Also, the availability of easy-to-use DL frameworks—like PyTorch and TensorFlow— has enhanced productivity of domain experts to work on their custom DL applications from diverse domains. However, existing Deep Neural Network (DNN) training approaches may not fully utilize the newly emerging powerful GPUs like the NVIDIA A100—this is the primary issue that we address in this paper. Our motivating analyses show that the GPU utilization on NVIDIA A100 can be as low as 43% using traditional DNN training approaches for small-to-medium DL models and input data size. This paper proposes AccDP—a data-parallel distributed DNN training approach—to accelerate GPU-based DL applications. AccDP exploits the Message Passing Interface (MPI) communication library coupled with the NVIDIA’s Multi-Process Service (MPS) to increase the amount of work assigned to parallel GPUs resulting in higher utilization of compute resources. We evaluate our proposed design on different small-to-medium DL models and input sizes on the state-of-the-art HPC clusters. By injecting more parallelism into DNN training using our approach, the evaluation shows up to 58% improvement in training performance on a single GPU and up to 62% on 16 GPUs compared to regular DNN training. Furthermore, we conduct an in-depth characterization to determine the impact of several DNN training factors and best practices—including the batch size and the number of data loading workers— to optimally utilize GPU devices. To the best of our knowledge, this is the first work that explores the use of MPS and MPI to maximize the utilization of GPUs in distributed DNN training.","PeriodicalId":119363,"journal":{"name":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122853150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Provenance-based Workflow Diagnostics Using Program Specification 使用程序规范的基于来源的工作流诊断

2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2022-12-01 DOI: 10.1109/HiPC56025.2022.00046

Yuta Nakamura, T. Malik, Iyad A. Kanj, Ashish Gehani

引用次数: 0

memwalkd : Accelerating Key-value stores using Page Table Walkers memwalkd:使用页表漫步器加速键值存储

2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2022-12-01 DOI: 10.1109/HiPC56025.2022.00021

R. S. Anupindi, Swaroop Kotni, Arkaprava Basu

引用次数: 0

EECAAP: Efficient Edge-Computing based Anonymous Authentication Protocol for IoV EECAAP:基于高效边缘计算的车联网匿名认证协议

2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2022-12-01 DOI: 10.1109/HiPC56025.2022.00047

Himani Sikarwar, D. Das

{"title":"EECAAP: Efficient Edge-Computing based Anonymous Authentication Protocol for IoV","authors":"Himani Sikarwar, D. Das","doi":"10.1109/HiPC56025.2022.00047","DOIUrl":"https://doi.org/10.1109/HiPC56025.2022.00047","url":null,"abstract":"Traditional security solutions for the edge have lots of challenges like high power consumption, communication, and computation overhead. These solutions are not feasible for highly dynamic resource-constrained (memory and processing power) Internet of Vehicles (IoV) networks. We can use Physical Unclonable Functions (PUFs) to address this issue. This paper discusses a new PUF-based anonymous mutual authentication and key exchange protocol for the IoV communication environment by combining unique, unpredictable PUFs and one-way hash functions. In the proposed protocol, a three-layered infrastructure, i.e., vehicle layer, edge computing layer, and cloud layer, is used for the IoV networks to make it efficient and improve the throughput and Quality of Service (QoS). The proposed hybrid system provides a security solution for cloning, side-channel and physical attacks along with the less computation and negligible storage cost. The implementation and performance analysis shows that the proposed protocol reduces the computation-communication overhead maximum upto 80% and 60% respectively. It also reduces the time complexity.","PeriodicalId":119363,"journal":{"name":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129947296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HiPC 2022 Technical Program Committee 2022年重债穷国技术计划委员会

2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2022-12-01 DOI: 10.1109/hipc56025.2022.00009

引用次数: 0

Scaling the SOO Global Blackbox Optimizer on a 128-core Architecture 在128核架构上扩展SOO全局黑盒优化器

2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2022-12-01 DOI: 10.1109/HiPC56025.2022.00037

David Redon, B. Derbel, P. Fortin

{"title":"Scaling the SOO Global Blackbox Optimizer on a 128-core Architecture","authors":"David Redon, B. Derbel, P. Fortin","doi":"10.1109/HiPC56025.2022.00037","DOIUrl":"https://doi.org/10.1109/HiPC56025.2022.00037","url":null,"abstract":"Blackbox optimization refers to the situation where no analytical knowledge about the problem is available beforehand, which is the case in a number of application fields, e.g., multi-disciplinary design, simulation optimization. In this context, the so-called Simultaneous Optimistic Optimization (SOO) algorithm is a deterministic tree-based global optimizer exposing theoretically provable performance guarantees under mild conditions. In this paper, we consider the efficient shared-memory parallelization of SOO on a high-end HPC architecture with dozens of CPU cores. We thereby propose different strategies based on eliciting the possible levels of parallelism underlying the SOO algorithm. We show that the naive approach, performing multiple evaluations of the blackbox function in parallel, does not scale with the number of cores. By contrast, we show that a parallel design based on the SOO-tree traversal is able to provide substantial improvements in terms of scalability and performance. We validate our strategies with a detailed performance analysis on a compute server with two 64-core processors, using a number of diverse benchmark functions with both increasing dimensions and number of cores.","PeriodicalId":119363,"journal":{"name":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131019523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Efficient Cache Allocation for High-Frequency Checkpointing 面向高频检查点的高效缓存分配

2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2022-12-01 DOI: 10.1109/HiPC56025.2022.00043

Avinash Maurya, Bogdan Nicolae, M. M. Rafique, Amr M. Elsayed, T. Tonellot, F. Cappello

{"title":"Towards Efficient Cache Allocation for High-Frequency Checkpointing","authors":"Avinash Maurya, Bogdan Nicolae, M. M. Rafique, Amr M. Elsayed, T. Tonellot, F. Cappello","doi":"10.1109/HiPC56025.2022.00043","DOIUrl":"https://doi.org/10.1109/HiPC56025.2022.00043","url":null,"abstract":"While many HPC applications are known to have long runtimes, this is not always because of single large runs: in many cases, this is due to ensembles composed of many short runs (runtime in the order of minutes). When each such run needs to checkpoint frequently (e.g. adjoint computations using a checkpoint interval in the order of milliseconds), it is important to minimize both checkpointing overheads at each iteration, as well as initialization overheads. With the rising popularity of GPUs, minimizing both overheads simultaneously is challenging: while it is possible to take advantage of efficient asynchronous data transfers between GPU and host memory, this comes at the cost of high initialization overhead needed to allocate and pin host memory. In this paper, we contribute with an efficient technique to address this challenge. The key idea is to use an adaptive approach that delays the pinning of the host memory buffer holding the checkpoints until all memory pages are touched, which greatly reduces the overhead of registering the host memory with the CUDA driver. To this end, we use a combination of asynchronous touching of memory pages and direct writes of checkpoints to untouched and touched memory pages in order to minimize end-to-end checkpointing overheads based on performance modeling. Our evaluations show a significant improvement over a variety of alternative static allocation strategies and state-of-art approaches.","PeriodicalId":119363,"journal":{"name":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123396915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Joint Partitioning and Sampling Algorithm for Scaling Graph Neural Network 缩放图神经网络的联合划分和抽样算法

2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2022-12-01 DOI: 10.1109/HiPC56025.2022.00018

Manohar Lal Das, Vishwesh Jatala, Gagan Raj Gupta

{"title":"Joint Partitioning and Sampling Algorithm for Scaling Graph Neural Network","authors":"Manohar Lal Das, Vishwesh Jatala, Gagan Raj Gupta","doi":"10.1109/HiPC56025.2022.00018","DOIUrl":"https://doi.org/10.1109/HiPC56025.2022.00018","url":null,"abstract":"Graph Neural Network (GNN) has emerged as a popular toolbox for solving complex problems on graph data structures. Graph neural networks use machine learning techniques to learn the vector representations of nodes and/or edges. Learning these representations demands a huge amount of memory and computing power. The traditional shared-memory multiprocessors are insufficient to meet real-world data’s computing requirements; hence, research has gained momentum toward distributed GNN.Scaling the distributed GNN has the following challenges: (1) the input graph needs to be efficiently partitioned, (2) the cost of communication between compute nodes should be reduced, and (3) the sampling strategy should be efficiently chosen to minimize the loss in accuracy. To address these challenges, we propose a joint partitioning and sampling algorithm, which partitions the input graph with weighted METIS and uses a bias sampling strategy to minimize total communication costs.We implemented our approach using the DistDGL framework and evaluated it using several real-world datasets. We observe that our approach (1) shows an average reduction in communication overhead by 53%, (2) requires less partitioning time to partition a graph, (3) shows improved accuracy, (4) shows a speed up of 1.5x on OGB-Arxiv dataset, when compared to the state-of-the-art DistDGL implementation.","PeriodicalId":119363,"journal":{"name":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"275 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122811900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0