2019 IEEE High Performance Extreme Computing Conference (HPEC)最新文献_第8页

Progressive Optimization of Batched LU Factorization on GPUs gpu上批量LU分解的渐进式优化

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916270

A. Abdelfattah, S. Tomov, J. Dongarra

引用次数: 4

A data-driven framework for uncertainty quantification of a fluidized bed 流化床不确定度量化的数据驱动框架

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916467

V. Kotteda, Anitha Kommu, Vinod Kumar

引用次数: 0

Scalable Inference for Sparse Deep Neural Networks using Kokkos Kernels 基于Kokkos核的稀疏深度神经网络的可扩展推理

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916378

J. Ellis, S. Rajamanickam

{"title":"Scalable Inference for Sparse Deep Neural Networks using Kokkos Kernels","authors":"J. Ellis, S. Rajamanickam","doi":"10.1109/HPEC.2019.8916378","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916378","url":null,"abstract":"Over the last decade, hardware advances have led to the feasibility of training and inference for very large deep neural networks. Sparsified deep neural networks (DNNs) can greatly reduce memory costs and increase throughput of standard DNNs, if loss of accuracy can be controlled. The IEEE HPEC Sparse Deep Neural Network Graph Challenge serves as a testbed for algorithmic and implementation advances to maximize computational performance of sparse deep neural networks. We base our sparse network for DNNs, KK-SpDNN, on the sparse linear algebra kernels within the Kokkos Kernels library. Using the sparse matrix-matrix multiplication in Kokkos Kernels allows us to reuse a highly optimized kernel. We focus on reducing the single node and multi-node runtimes for 12 sparse networks. We test KK-SpDNN on Intel Skylake and Knights Landing architectures and see 120-500x improvement on single node performance over the serial reference implementation. We run in data-parallel mode with MPI to further speed up network inference, ultimately obtaining an edge processing rate of 1.16e+12 on 20 Skylake nodes. This translates to a 13x speed up on 20 nodes compared to our highly optimized multithreaded implementation on a single Skylake node.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"168 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115584051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Exploring the Efficiency of OpenCL Pipe for Hiding Memory Latency on Cloud FPGAs 探索OpenCL管道在云fpga上隐藏内存延迟的效率

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916236

Arnab A. Purkayastha, S. Raghavendran, Jhanani Thiagarajan, H. Tabkhi

{"title":"Exploring the Efficiency of OpenCL Pipe for Hiding Memory Latency on Cloud FPGAs","authors":"Arnab A. Purkayastha, S. Raghavendran, Jhanani Thiagarajan, H. Tabkhi","doi":"10.1109/HPEC.2019.8916236","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916236","url":null,"abstract":"OpenCL programming ability combined with OpenCL High-Level Synthesis (OpenCL-HLS) tools have made tremendous improvements in the reconfigurable computing field. FPGAs inherent pipelined parallelism capability provides not only faster execution times but also power-efficient solutions when executing massively parallel applications. A major execution bottleneck affecting FPGA performance is the high number of memory stalls exposed to pipelined data-path that hinders the benefits of data-path customization.This paper explores the efficiency of “OpenCL Pipe” to hide memory access latency on cloud FPGAs by decoupling memory access from computation. The Pipe semantic is leveraged to split OpenCL kernels into “read”, “compute” and “write back” sub-kernels which work concurrently to overlap the computation of current threads with the memory access of future threads. For evaluation, we use a mix of seven massively parallel high-performance applications from the Rodinia suite vs. 3.1. All our tests are conducted on the Xilinx VU9FP FPGA platform of Amazon cloud-based AWS EC2 F1 instance. On average, we observe 5.2x speedup with a 2.2x increase in memory bandwidth utilization with about 2.5x increase in FPGA resource utilization over the baseline synthesis (Xilinx OpenCL-HLS).11This work has been funded and supported by the Xilinx University Program (XUP)..","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131352823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Scalable Triangle Counting on Distributed-Memory Systems 分布式内存系统上的可伸缩三角计数

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916302

Seher Acer, Abdurrahman Yasar, S. Rajamanickam, Michael M. Wolf, Ümit V. Çatalyürek

{"title":"Scalable Triangle Counting on Distributed-Memory Systems","authors":"Seher Acer, Abdurrahman Yasar, S. Rajamanickam, Michael M. Wolf, Ümit V. Çatalyürek","doi":"10.1109/HPEC.2019.8916302","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916302","url":null,"abstract":"Triangle counting is a foundational graph-analysis kernel in network science. It has also been one of the challenge problems for the “Static Graph Challenge”. In this work, we propose a novel, hybrid, parallel triangle counting algorithm based on its linear algebra formulation. Our framework uses MPI and Cilk to exploit the benefits of distributed-memory and shared-memory parallelism, respectively. The problem is partitioned among MPI processes using a two-dimensional (2D) Cartesian block partitioning. One-dimensional (1D) rowwise partitioning is used within the Cartesian blocks for shared-memory parallelism using the Cilk programming model. Besides exhibiting very good strong scaling behavior in almost all tested graphs, our algorithm achieves the fastest time on the 1.4B edge real-world twitter graph, which is 3.217 seconds, on 1,092 cores. In comparison to past distributed-memory parallel winners of the graph challenge, we demonstrate a speed up of 2.7× on this twitter graph. This is also the fastest time reported for parallel triangle counting on the twitter graph when the graph is not replicated.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133160698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Fast Triangle Counting on GPU 快速三角形计数GPU

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916216

Chuangyi Gui, Long Zheng, Pengcheng Yao, Xiaofei Liao, Hai Jin

引用次数: 0

Lossless Compression of Internal Files in Parallel Reservoir Simulation 并行油藏模拟中内部文件的无损压缩

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916298

M. Rogowski, Suha N. Kayum, F. Mannuß

引用次数: 0

Proactive Cyber Situation Awareness via High Performance Computing 基于高性能计算的主动网络态势感知

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916528

A. Wollaber, Jaime Peña, Benjamin Blease, Leslie Shing, Kenneth Alperin, Serge Vilvovsky, P. Trepagnier, Neal Wagner, Leslie Leonard

{"title":"Proactive Cyber Situation Awareness via High Performance Computing","authors":"A. Wollaber, Jaime Peña, Benjamin Blease, Leslie Shing, Kenneth Alperin, Serge Vilvovsky, P. Trepagnier, Neal Wagner, Leslie Leonard","doi":"10.1109/HPEC.2019.8916528","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916528","url":null,"abstract":"Cyber situation awareness technologies have largely been focused on present-state conditions, with limited abilities to forward-project nominal conditions in a contested environment. We demonstrate an approach that uses data-driven, high performance computing (HPC) simulations of attacker/defender activities in a logically connected network environment that enables this capability for interactive, operational decision making in real time. Our contributions are three-fold: (1) we link live cyber data to inform the parameters of a cybersecurity model, (2) we perform HPC simulations and optimizations with a genetic algorithm to evaluate and recommend risk remediation strategies that inhibit attacker lateral movement, and (3) we provide a prototype platform to allow cyber defenders to assess the value of their own alternative risk reduction strategies on a relevant timeline. We present an overview of the data and software architectures, and results are presented that demonstrate operational utility alongside HPC-enabled runtimes.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"32 Pt 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133993922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Auxiliary Maximum Likelihood Estimation for Noisy Point Cloud Registration 噪声点云配准的辅助最大似然估计

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916224

Cole Campton, Xiaobai Sun

引用次数: 0

Concurrent Katz Centrality for Streaming Graphs 流图的并发Katz中心性

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916572

Chunxing Yin, E. J. Riedy

引用次数: 2