2020 IEEE High Performance Extreme Computing Conference (HPEC)最新文献

Target Classification in Synthetic Aperture Radar and Optical Imagery Using Loihi Neuromorphic Hardware 基于Loihi神经形态硬件的合成孔径雷达与光学成像目标分类

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286246

Mark D. Barnell, Courtney Raymond, Matthew Wilson, Darrek Isereau, Chris Cicotta

{"title":"Target Classification in Synthetic Aperture Radar and Optical Imagery Using Loihi Neuromorphic Hardware","authors":"Mark D. Barnell, Courtney Raymond, Matthew Wilson, Darrek Isereau, Chris Cicotta","doi":"10.1109/HPEC43674.2020.9286246","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286246","url":null,"abstract":"Intel's novel Loihi processing chip has been used to explore new information exploitation techniques. Specifically, we analyzed two types of data (optical and radar). These data modalities and associated machine learning algorithms were used to showcase the ability of the system to address real world problems, such as object detection and classification. Intel's fully digital Loihi design is inspired by biological processes and brain functions. Neuromorphic architectures, such as Loihi, promise to improve computational efficiency for various machine learning tasks with a realizable path toward implementation into many systems, e.g., airborne computing for intelligence, surveillance and reconnaissance systems, and/or future autonomous vehicles and household appliances. With the current software development kit, it is possible to train an artificial neural network model in a common deep learning framework such as Keras and quantize the model weights for a simplistic, direct translation onto the Loihi hardware. The radar imagery analyzed included a seven-vehicle class target set, which was processed at a rate of 9.5 images per second and with an overall accuracy of 90.1%. The optical data included a binary (two classes), and another nine-class data set. The binary classifier processed the optical data at a rate of 12.8 images per second with 94.0% accuracy. The nine classes optical data was processed at a rate 12.9 images per second and 79.7% accuracy. Lastly, the system used ~6 Watts of total power with ~0.6 Watts being utilized by the neuromorphic cores. The inferencing energy used to classify each image varied between 14.9 and 63.2 millijoules/image.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115428935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Triangle Counting with Cyclic Distributions 循环分布的三角形计数

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286220

A. Lumsdaine, Luke Dalessandro, Kevin Deweese, J. Firoz, Scott McMillan

引用次数: 5

Exploiting GPU Direct Access to Non-Volatile Memory to Accelerate Big Data Processing 利用GPU直接访问非易失性存储器加速大数据处理

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286174

Mahsa Bayati, M. Leeser, N. Mi

{"title":"Exploiting GPU Direct Access to Non-Volatile Memory to Accelerate Big Data Processing","authors":"Mahsa Bayati, M. Leeser, N. Mi","doi":"10.1109/HPEC43674.2020.9286174","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286174","url":null,"abstract":"The amount of data being collected for analysis is growing at an exponential rate. Along with this growth comes increasing necessity for computation and storage. Researchers are addressing these needs by building heterogeneous clusters with CPUs and computational accelerators such as GPUs equipped with high I/O bandwidth storage devices. One of the main bottlenecks of such heterogeneous systems is the data transfer bandwidth to GPUs when running I/O intensive applications. The traditional approach gets data from storage to the host memory and then transfers it to the GPU, which can limit data throughput and processing and thus degrade the end-to-end performance. In this paper, we propose a new framework to address the above issue by exploiting Peer-to-Peer Direct Memory Access to allow GPU direct access of the storage device and thus enhance the performance for parallel data processing applications in a heterogeneous big-data platform. Our heterogeneous cluster is supplied with CPUs and GPUs as computing resources and Non-Volatile Memory express (NVMe) drives as storage resources. We deploy an Apache Spark platform to execute representative data processing workloads over this heterogeneous cluster and then adopt Peer-to-Peer Direct Memory Access to connect GPUs to non-volatile storage directly to optimize the GPU data access. Experimental results reveal that this heterogeneous Spark platform successfully bypasses the host memory and enables GPUs to communicate directly to the NVMe drive, thus achieving higher data transfer throughput and improving both data communication time and end-to-end nerformance by 20%.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127440208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Enhanced Parallel Simulation for ACAS X Development 用于ACAS X开发的增强并行仿真

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286197

A. Gjersvik

{"title":"Enhanced Parallel Simulation for ACAS X Development","authors":"A. Gjersvik","doi":"10.1109/HPEC43674.2020.9286197","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286197","url":null,"abstract":"ACAS X is the next generation airborne collision avoidance system intended to meet the demands of the rapidly evolving U.S. National Airspace System (NAS). The collision avoidance safety and operational suitability of the system are optimized and continuously evaluated by simulating billions of characteristic aircraft encounters in a fast-time Monte Carlo environment. There is therefore an inherent computational cost associated with each ACAS X design iteration and parallelization of the simulations is necessary to keep up with rapid design cycles. This work describes an effort to profile and enhance the parallel computing infrastructure deployed on the computing resources offered by the Lincoln Laboratory Supercomputing Center. The approach to large-scale parallelization of our fast-time airspace encounter simulation tool is presented along with corresponding parallel profile data collected on different kinds of compute nodes. A simple stochastic model for distributed simulation is also presented to inform optimal work batching for improved simulation efficiency. The paper concludes with a discussion on how this high-performance parallel simulation method enables the rapid safety-critical design of ACAS X in a fast-paced iterative design process.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126020577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Parameter Sensitivity Analysis of the SparTen High Performance Sparse Tensor Decomposition Software SparTen高性能稀疏张量分解软件的参数灵敏度分析

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286210

J. Myers, Daniel M. Dunlavy, K. Teranishi, D. Hollman

{"title":"Parameter Sensitivity Analysis of the SparTen High Performance Sparse Tensor Decomposition Software","authors":"J. Myers, Daniel M. Dunlavy, K. Teranishi, D. Hollman","doi":"10.1109/HPEC43674.2020.9286210","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286210","url":null,"abstract":"Tensor decomposition models play an increasingly important role in modern data science applications. One problem of particular interest is fitting a low-rank Canonical Polyadic (CP) tensor decomposition model when the tensor has sparse structure and the tensor elements are nonnegative count data. SparTen is a high-performance C++ library which computes a low-rank decomposition using different solvers: a first-order quasi-Newton or a second-order damped Newton method, along with the appropriate choice of runtime parameters. Since default parameters in SparTen are tuned to experimental results in prior published work on a single real-world dataset conducted using MATLAB implementations of these methods, it remains unclear if the parameter defaults in SparTen are appropriate for general tensor data. Furthermore, it is unknown how sensitive algorithm convergence is to changes in the input parameter values. This report addresses these unresolved issues with large-scale experimentation on three benchmark tensor data sets. Experiments were conducted on several different CPU architectures and replicated with many initial states to establish generalized profiles of algorithm convergence behavior.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128496517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Beyond Floating-Point Ops: CNN Performance Prediction with Critical Datapath Length 超越浮点运算:关键数据路径长度的CNN性能预测

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286182

David Langerman, A. Johnson, Kyle Buettner, A. George

引用次数: 5

Large-scale Sparse Tensor Decomposition Using a Damped Gauss-Newton Method 基于阻尼高斯-牛顿法的大规模稀疏张量分解

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286202

Teresa M. Ranadive, M. Baskaran

{"title":"Large-scale Sparse Tensor Decomposition Using a Damped Gauss-Newton Method","authors":"Teresa M. Ranadive, M. Baskaran","doi":"10.1109/HPEC43674.2020.9286202","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286202","url":null,"abstract":"CANDECOMP/PARAFAC (CP) tensor decomposition is a popular unsupervised machine learning method with numerous applications. This process involves modeling a high-dimensional, multi-modal array (a tensor) as the sum of several low-dimensional components. In order to decompose a tensor, one must solve an optimization problem, whose objective is often given by the sum of the squares of the tensor and decomposition model entry differences. One algorithm occasionally utilized to solve such problems is CP-OPT-DGN, a damped Gauss-Newton all-at-once optimization method for CP tensor decomposition. However, there are currently no published results that consider the decomposition of large-scale (with up to billions of non-zeros), sparse tensors using this algorithm. This work considers the decomposition of large-scale tensors using an efficiently implemented CP-OPT-DGN method. It is observed that CP-OPT-DGN significantly outperforms CP-ALS (CP-Alternating Least Squares) and CP-OPT-QNR (a quasi-Newton-Raphson all-at-once optimization method for CP tensor decomposition), two other widely used tensor decomposition algorithms, in terms of accuracy and latent behavior detection.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126700663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Using RAPIDS AI to Accelerate Graph Data Science Workflows 使用RAPIDS AI加速图形数据科学工作流程

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286224

Todd Hricik, David A. Bader, Oded Green

{"title":"Using RAPIDS AI to Accelerate Graph Data Science Workflows","authors":"Todd Hricik, David A. Bader, Oded Green","doi":"10.1109/HPEC43674.2020.9286224","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286224","url":null,"abstract":"Scale free networks are abundant in many natural, social, and engineering phenomena for which there exists a substantial corpus of theory able to elucidate many of their underlying properties. In this paper we study the scalability of some widely available Python-based tools for the empirical investigation of scale free network data in a typical early stage analysis pipeline. We demonstrate how porting serial implementations of commonly used pipeline data structures and methods to parallel hardware via the NVIDIA RAPIDS AI API requires minimal rewriting of code. As a utility for each pipeline we recorded the time required to complete the analysis for both the serial and parallelized workflows on a task-wise basis. Furthermore, we review a statistically based methodology for fitting a power-law to empirical data. Maximum likelihood estimations for scale were inferred after using Kolmogorov-Smirnov based methods to determine location estimates. Our serial implementation of a typical early stage network analysis workflow uses a combination of widely used data structures and algorithms provided by the NumPy, Pandas and NetworkX frameworks. We then parallelized our workflow using the APIs provided by NVIDIA's RAPIDS AI open data science libraries and measured the relative time to completion for the tasks of ingesting raw data, creating a graph representation of the data and finally fitting a power-law distribution to the empirical observations. The results of our experiments, run on graphs ranging in size from 1 million to 20 million edges, demonstrate that significantly less time is required to complete the tasks of generating a graph from an edge list, computing the degree of all nodes in the graph and fitting the scale and location parameters to the observed data.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114090785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MetaCL: Automated “Meta” OpenCL Code Generation for High-Level Synthesis on FPGA MetaCL:用于FPGA高级合成的自动“元”OpenCL代码生成

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286198

P. Sathre, Atharva Gondhalekar, Mohamed W. Hassan, W. Feng

引用次数: 0

Accelerator Design and Performance Modeling for Homomorphic Encrypted CNN Inference 同态加密CNN推理的加速器设计与性能建模

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286219

Tian Ye, R. Kannan, V. Prasanna

{"title":"Accelerator Design and Performance Modeling for Homomorphic Encrypted CNN Inference","authors":"Tian Ye, R. Kannan, V. Prasanna","doi":"10.1109/HPEC43674.2020.9286219","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286219","url":null,"abstract":"The rapid advent of cloud computing has brought with it concerns on data security and privacy. Fully Homomorphic Encryption (FHE) is a technique for enabling data security that allows arbitrary computations to be performed directly on encrypted data. In particular, FHE can be used with convolutional neural networks (CNN) to perform inference as a service on homomorphic encrypted input data. However, the high computational demands of FHE inference require a careful understanding of the tradeoffs between various parameters such as security level, hardware resources and performance. In this paper, we propose a parameterized accelerator for homomorphic encrypted CNN inference. We first develop parallel algorithms to implement CNN operations via FHE primitives. We then develop a parameterized model to evaluate the performance of our CNN design. The model accepts inputs in terms of available hardware resources and security parameters and outputs performance estimates. As an illustration, for a typical image classification task on CIFAR-10 dataset with a seven-layer CNN model, we show that a batch of 4K encrypted images can be classified within 1 second on a device operating at 2 GHz clock rate with 16K MACs, 64 MB on-chip memory and 256 GB/s external memory bandwidth.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121546914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4