2020 IEEE High Performance Extreme Computing Conference (HPEC)最新文献_第2页

On the Feasibility of Using Reduced-Precision Tensor Core Operations for Graph Analytics 在图分析中使用降精度张量核心运算的可行性

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286152

J. Firoz, Ang Li, Jiajia Li, K. Barker

引用次数: 3

Combinatorial Tiling for Sparse Neural Networks 稀疏神经网络的组合平铺

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286154

Filip Pawlowski, R. Bisseling, B. Uçar, Albert-Jan N. Yzelman

引用次数: 5

Hash Table Scalability on Intel PIUMA Intel PIUMA上哈希表的可扩展性

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286204

B. Seshasayee, J. Fryman, I. Hur

引用次数: 1

A High Throughput Parallel Hash Table on FPGA using XOR-based Memory 基于xor存储器的FPGA高吞吐量并行哈希表

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286199

Ruizhi Zhang, Sasindu Wijeratne, Yang Yang, S. Kuppannagari, V. Prasanna

{"title":"A High Throughput Parallel Hash Table on FPGA using XOR-based Memory","authors":"Ruizhi Zhang, Sasindu Wijeratne, Yang Yang, S. Kuppannagari, V. Prasanna","doi":"10.1109/HPEC43674.2020.9286199","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286199","url":null,"abstract":"Hash table is a fundamental data structure for quick search and retrieval of data. It is a key component in complex graph analytics and AI/ML applications. State-of-the-art parallel hash table implementations either make some simplifying assumptions such as supporting only a subset of hash table operations or employ optimizations that lead to performance that is highly data dependent and in the worst case can be similar to a sequential implementation. In contrast, in this work we develop a dynamic hash table that supports all the hash table queries - search, insert, delete, update, while allowing us to support $p$ parallel queries (p > 1) per clock cycle via $p$ processing engines (PEs) in the worst case i.e. the performance is data agnostic. We achieve this by implementing novel XOR based multi-ported block memories on FPGAs. Additionally, we develop a technique to optimize the memory requirement of the hash table if the ratio of search to insert/update/delete queries is known beforehand. We implement our design on state-of-the-art FPGA devices. Our design is scalable to 16 PEs and supports throughput up to 5926 MOPS. It matches the throughput of the state-of-the-art hash table design - FASTHash, which only supports search and insert operations. Comparing with the best FPGA design that supports the same set of operations, our hash table achieves up to 12.3 x speedup.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114596578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Hybrid Approach to HPC Cluster Telemetry and Hardware Log Analytics 高性能计算集群遥测和硬件日志分析的混合方法

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286239

J. Thaler, Woong Shin, S. Roberts, James H. Rogers, Todd J. Rosedahl

{"title":"Hybrid Approach to HPC Cluster Telemetry and Hardware Log Analytics","authors":"J. Thaler, Woong Shin, S. Roberts, James H. Rogers, Todd J. Rosedahl","doi":"10.1109/HPEC43674.2020.9286239","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286239","url":null,"abstract":"The number of computer processing nodes and processor cores in cluster systems is growing rapidly. Discovering, and reacting to, a hardware or environmental issue in a timely manner enables proper fault isolation, improves quality of service, and improves system up-time. In the case of performance impacts and node outages, RAS policies can direct actions such as job quiescence or migration. Additionally, power consumption, thermal information, and utilization metrics can be used to provide cluster energy and cooling efficiency improvements as well as optimized job placement. This paper describes a highly scalable telemetry architecture that allows event aggregation, application of RAS policies, and provides the ability for cluster control system feedback. The architecture advances existing approaches by including both programmable policies, which are applied as events stream through the hierarchical network to persistence storage, and treatment of sensor telemetry in an extensible framework. This implementation has proven robust and is in use in both cloud and HPC environments including the Summit system of 4,608 nodes at Oak Ridge National Laboratory [5].","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"440 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125067032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Homomorphic Encryption Based Secure Sensor Data Processing 基于同态加密的传感器数据安全处理

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286175

V. Gadepally, Mihailo Isakov, R. Agrawal, J. Kepner, K. Gettings, M. Kinsy

引用次数: 0

Computing PageRank Scores of Web Crawl Data Using DGX A100 Clusters 使用DGX A100集群计算网页抓取数据的PageRank分数

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286216

Seunghwa Kang, Alexandre Fender, Joe Eaton, Brad Rees

引用次数: 4

Analysis of floating-point round-off error in linear algebra routines for graph clustering 图聚类线性代数例程的浮点舍入误差分析

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286190

L. Yang, Alyson Fox

引用次数: 0

Inference Benchmarking on HPC Systems 高性能计算系统的推理基准测试

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286138

W. Brewer, G. Behm, A. Scheinine, Ben Parsons, Wesley Emeneker, Robert P. Trevino

{"title":"Inference Benchmarking on HPC Systems","authors":"W. Brewer, G. Behm, A. Scheinine, Ben Parsons, Wesley Emeneker, Robert P. Trevino","doi":"10.1109/HPEC43674.2020.9286138","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286138","url":null,"abstract":"As deep learning on edge computing systems has become more prevalent, investigation of architectures and configurations for optimal inference performance has become a critical step for proposed artificial intelligence solutions. While there has been considerable work in the development of hardware and software for high performance inferencing, there is little known about the performance of such systems on HPC architectures. In this paper, we address outstanding questions on the parallel inference performance on HPC systems. We report results and recommendations derived from evaluating iBench on multiple platforms in a variety of HPC configurations. We systematically benchmark single-GPU performance, single-node performance, and multi-node performance for maximum client-side and server-side inference throughput. In order to achieve linear speedup, we show that concurrent sending clients must be used, as opposed to sending large batch payloads parallelized across multiple GPUs. We show that client/server inferencing architectures add a considerable data transfer component that needs to be taken into consideration when benchmarking HPC system that benchmarks such as MLPerf do not measure. Finally, we investigate energy efficiency of GPUs for different levels of concurrency and batch sizes to report optimal configurations that minimize cost per inference.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126794693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Fast GPU Graph Contraction by Combining Efficient Shallow Searches and Post-Culling 结合高效浅搜索和后淘汰的快速GPU图收缩

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286141

Roozbeh Karimi, David M. Koppelman, C. J. Michael

{"title":"Fast GPU Graph Contraction by Combining Efficient Shallow Searches and Post-Culling","authors":"Roozbeh Karimi, David M. Koppelman, C. J. Michael","doi":"10.1109/HPEC43674.2020.9286141","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286141","url":null,"abstract":"Efficient GPU single-source shortest-path (SSSP) queries of road network graphs can be realized by a technique called PHAST (Delling et al.) in which the graph is contracted (pre-processed using Geisberger's Contraction Hierarchies) once and the resulting contracted graph is queried as needed. PHAST accommodates GPUs' parallelism requirements well, resulting in efficient queries. For situations in which a graph is not available well in advance or changes frequently contraction time itself becomes significant. Karimi et al. recently described a GPU contraction technique, CU-CH, which significantly reduces the contraction time of small-to medium-sized graphs, reporting a speedup of over 20× on an NVidia P100 GPU. However CU-CH realizes little speedup on larger graphs, such as DIMACS’ USA and W. Europe graphs. The obstacle to faster contraction of larger graphs is the frequently performed witness path search (WPS). A WPS for a node determines which shortcut edges need to be added between the node's neighbors to maintain distances after the removal of the node. GPUs' strict thread convergence requirements and limited scratchpad preclude the bidirectional Dijkstra approach used in CPU implementations. Instead, CU-CH uses a two-hop-limit WPS tightly coded to fit GPU shared storage and to maintain thread convergence. Where two hops is sufficient speedup is high, but for larger graphs the hop limit exacts a toll due to the accumulation of unneeded shortcuts. The problem is overcome here by retaining the efficient CU-CH WPS but using it both for its original purpose and also to identify unnecessary shortcuts added in prior steps. The unnecessary shortcuts are culled (removed). Culling shortcuts not only dramatically reduces the time needed to contract a graph but also improves the quality of the contracted graph. For smaller graphs such as DIMACS Cal (travel time) contraction time is 61 % faster compared to CU-CH. For the DIMACS Europe and USA graphs contraction times are 40× and 12× faster, respectively. SSSP query times also improve dramatically, approaching those obtained on aggressively contracted graphs. The speedup over Geisberger's CPU code is over 100 times for NVidia VI00 GPUs on most graphs tried.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121125854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0