2019 IEEE High Performance Extreme Computing Conference (HPEC)最新文献

Low Power Computing and Simultaneous Electro-Optical/Radar Data Processing using IBM’s NS16e 16-chip Neuromorphic Hardware 使用IBM的NS16e 16芯片神经形态硬件的低功耗计算和同步光电/雷达数据处理

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916311

Mark D. Barnell, Courtney Raymond, Daniel Brown, Matthew Wilson, Éric Côté

{"title":"Low Power Computing and Simultaneous Electro-Optical/Radar Data Processing using IBM’s NS16e 16-chip Neuromorphic Hardware","authors":"Mark D. Barnell, Courtney Raymond, Daniel Brown, Matthew Wilson, Éric Côté","doi":"10.1109/HPEC.2019.8916311","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916311","url":null,"abstract":"For the first time ever, advanced machine learning (ML) compute architectures, techniques, and methods were demonstrated on United States Geological Survey (USGS) optical imagery and Department of Defense (DoD) Synthetic Aperture Radar (SAR) imagery, simultaneously, using IBM’s new NS16e neurosynaptic processor board comprised of 16 TrueNorth chips. The Air Force Research Laboratory (AFRL) Information Directorate Advanced Computing and Communications Division continues to develop and demonstrate new bio-inspired computing algorithms and architectures, designed to provide advanced, ultra-low power, ground and airborne High-Performance Computing (HPC) solutions to meet operational and tactical, real-time processing needs for Intelligence, Surveillance, and Reconnaissance (ISR) missions on small form factor hardware, and in Size, Weight and Power (SWaP) constrained environments. With an average throughput of 16,000 inferences per second, the system provided a processing efficiency of 1,066 inferences per Watt. The NS16e power utilization never exceeded 15 Watts for this application. The contribution of power consumption from TrueNorth processors was bound to less than 5.5 Watts.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123102547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Combining Tensor Decompositions and Graph Analytics to Provide Cyber Situational Awareness at HPC Scale 结合张量分解和图形分析提供高性能计算规模的网络态势感知

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916559

J. Ezick, Ben Parsons, W. Glodek, Thomas Henretty, M. Baskaran, R. Lethin, J. Feo, Tai-Ching Tuan, Christopher J. Coley, Leslie Leonard, R. Agrawal

{"title":"Combining Tensor Decompositions and Graph Analytics to Provide Cyber Situational Awareness at HPC Scale","authors":"J. Ezick, Ben Parsons, W. Glodek, Thomas Henretty, M. Baskaran, R. Lethin, J. Feo, Tai-Ching Tuan, Christopher J. Coley, Leslie Leonard, R. Agrawal","doi":"10.1109/HPEC.2019.8916559","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916559","url":null,"abstract":"This paper describes MADHAT (Multidimensional Anomaly Detection fusing HPC, Analytics, and Tensors), an integrated workflow that demonstrates the applicability of HPC resources to the problem of maintaining cyber situational awareness. MADHAT combines two high-performance packages: ENSIGN for large-scale sparse tensor decompositions and HAGGLE for graph analytics. Tensor decompositions isolate coherent patterns of network behavior in ways that common clustering methods based on distance metrics cannot. Parallelized graph analysis then uses directed queries on a representation that combines the elements of identified patterns with other available information (such as additional log fields, domain knowledge, network topology, whitelists and blacklists, prior feedback, and published alerts) to confirm or reject a threat hypothesis, collect context, and raise alerts. MADHAT was developed using the collaborative HPC Architecture for Cyber Situational Awareness (HACSAW) research environment and evaluated on structured network sensor logs collected from Defense Research and Engineering Network (DREN) sites using HPC resources at the U.S. Army Engineer Research and Development Center DoD Supercomputing Resource Center (ERDC DSRC). To date, MADHAT has analyzed logs with over 650 million entries.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123546394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Prototype Container-Based Platform for Extreme Quantum Computing Algorithm Development 基于原型容器的极限量子计算算法开发平台

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916430

P. Dreher, Madhuvanti Ramasami

{"title":"Prototype Container-Based Platform for Extreme Quantum Computing Algorithm Development","authors":"P. Dreher, Madhuvanti Ramasami","doi":"10.1109/HPEC.2019.8916430","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916430","url":null,"abstract":"Recent advances in the development of the first generation of quantum computing devices have provided researchers with computational platforms to explore new ideas and reformulate conventional computational codes suitable for a quantum computer. Developers can now implement these reformulations on both quantum simulators and hardware platforms through a cloud computing software environment. For example, the IBM Q Experience provides the direct access to their quantum simulators and quantum computing hardware platforms. However these current access options may not be an optimal environment for developers needing to download and modify the source codes and libraries. This paper focuses on the construction of a Docker container environment with Qiskit source codes and libraries running on a local cloud computing system that can directly access the IBM Q Experience. This prototype container based system allows single user and small project groups to do rapid prototype development, testing and implementation of extreme capability algorithms with more agility and flexibility than can be provided through the IBM Q Experience website. This prototype environment also provides an excellent teaching environment for labs and project assignments within graduate courses in cloud computing and quantum computing. The paper also discusses computer security challenges for expanding this prototype container system to larger groups of quantum computing researchers.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125287470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Update on Triangle Counting on GPU 更新了GPU上的三角形计数

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916547

Carl Pearson, M. Almasri, Omer Anjum, Vikram Sharma Mailthody, Zaid Qureshi, R. Nagi, Jinjun Xiong, Wen-mei W. Hwu

引用次数: 13

A GPU Implementation of the Sparse Deep Neural Network Graph Challenge 稀疏深度神经网络图挑战的GPU实现

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916223

M. Bisson, M. Fatica

引用次数: 16

Multi-spectral Reuse Distance: Divining Spatial Information from Temporal Data 多光谱复用距离:从时间数据中提取空间信息

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916398

A. Cabrera, R. Chamberlain, J. Beard

{"title":"Multi-spectral Reuse Distance: Divining Spatial Information from Temporal Data","authors":"A. Cabrera, R. Chamberlain, J. Beard","doi":"10.1109/HPEC.2019.8916398","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916398","url":null,"abstract":"The problem of efficiently feeding processing elements and finding ways to reduce data movement is pervasive in computing. Efficient modeling of both temporal and spatial locality of memory references is invaluable in identifying superfluous data movement in a given application. To this end, we present a new way to infer both spatial and temporal locality using reuse distance analysis. This is accomplished by performing reuse distance analysis at different data block granularities: specifically, 64B, 4KiB, and 2MiB sizes. This process of simultaneously observing reuse distance with multiple granularities is called multi-spectral reuse distance. This approach allows for a qualitative analysis of spatial locality, through observing the shifting of mass in an application’s reuse signature at different granularities. Furthermore, the shift of mass is empirically measured by calculating the Earth Mover’s Distance between reuse signatures of an application. From the characterization, it is possible to determine how spatially dense the memory references of an application are based on the degree to which the mass has shifted (or not shifted) and how close (or far) the Earth Mover’s Distance is to zero as the data block granularity is increased. It is also possible to determine an appropriate page size from this information, and whether or not a given page is being fully utilized. From the applications profiled, it is observed that not all applications will benefit from having a larger page size. Additionally, larger data block granularities subsuming smaller ones suggest that larger pages will allow for more spatial locality exploitation, but examining the memory footprint will show whether those larger pages are fully utilized or not.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"118 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126419628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

IP Cores for Graph Kernels on FPGAs fpga上图形核的IP核

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916363

S. Kuppannagari, Rachit Rajat, R. Kannan, A. Dasu, V. Prasanna

{"title":"IP Cores for Graph Kernels on FPGAs","authors":"S. Kuppannagari, Rachit Rajat, R. Kannan, A. Dasu, V. Prasanna","doi":"10.1109/HPEC.2019.8916363","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916363","url":null,"abstract":"Graphs are a powerful abstraction for representing networked data in many real-world applications. The need for performing large scale graph analytics has led to widespread adoption of dedicated hardware accelerators such as FPGA for this purpose. In this work, we develop IP cores for several key graph kernels. Our IP cores use graph processing over partitions (GPOP) programming paradigm to perform computations over graph partitions. Partitioning the input graph into nonoverlapping partitions improves on-chip data reuse. Additional optimizations to exploit intra and interpartition parallelism and to reduce external memory accesses are also discussed. We generate FPGA designs for general graph algorithms with various vertex attributes and update propagation functions, such as Sparse Matrix Vector Multiplication (SpMV), PageRank (PR), Single Source Shortest Path (SSSP), and Weakly Connected Component (WCC). We target a platform consisting of large external DDR4 memory to store the graph data and Intel Stratix FPGA to accelerate the processing. Experimental results show that our accelerators sustain a high throughput of up to 2250, 2300, 3378, and 2178 Million Traversed Edges Per Second (MTEPS) for SpMV, PR, SSSP and WCC, respectively. Compared with several highly-optimized multi-core designs, our FPGA framework achieves up to 20.5× speedup for SpMV, 16.4× speedup for PR, 3.5× speedup for SSSP, and 35.1× speedup for WCC, and compared with two state-of-the-art FPGA frameworks, our designs demonstrate up to 5.3× speedup for SpMV, 1.64× speedup for PR, and 1.8× speedup for WCC, respectively. We develop a performance model for our GPOP paradigm. We then perform performance predictions of our designs assuming the graph is stored in HBM2 instead of DRAM. We further discuss extensions to our optimizations to improve the throughput.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128035941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

BLAST: Blockchain-based Trust Management in Smart Cities and Connected Vehicles Setup BLAST:基于区块链的智慧城市信任管理和联网车辆设置

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916229

Farah I. Kandah, Brennan Huber, Amani Altarawneh, Sai Medury, A. Skjellum

{"title":"BLAST: Blockchain-based Trust Management in Smart Cities and Connected Vehicles Setup","authors":"Farah I. Kandah, Brennan Huber, Amani Altarawneh, Sai Medury, A. Skjellum","doi":"10.1109/HPEC.2019.8916229","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916229","url":null,"abstract":"Advancement in communication technologies and the Internet of Things (IoT) is driving smart cities adoption that aims to increase operational efficiency of infrastructure, improve the quality of services, and citizen welfare, among other worthy goals. For instance, it is estimated that by 2020, 75% of cars shipped globally will be equipped with hardware to facilitate vehicle connectivity. The privacy, reliability, and integrity of communication must be ensured so that actions can be accurate and implemented promptly after receiving actionable information. Because vehicles are equipped with the ability to compute, communicate, and sense their environment, there is a concomitant critical need to create and maintain trust among network entities in the context of the network’s dynamism, an issue that requires building and validating the trust between entities in a small amount of time before entities leave each other’s range. In this work, we present a multi-tier scheme consisting of an authentication- and trust-building/distribution framework designed with blockchain technology to ensure the safety and validity of the information exchanged in the system. Through simulation, we illustrate the tradeoff between blockchain mining time and the number of blocks being generated as well as the effect of the vehicle speed on the number of blocks being generated.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122018050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Optimizing the Visualization Pipeline of a 3-D Monitoring and Management System 三维监控管理系统可视化管道的优化

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916493

Rebecca Wild, M. Hubbell, J. Kepner

引用次数: 2

Artificial Neural Network and Accelerator Co-design using Evolutionary Algorithms 基于进化算法的人工神经网络和加速器协同设计

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916533

Philip Colangelo, Oren Segal, Alexander Speicher, M. Margala

{"title":"Artificial Neural Network and Accelerator Co-design using Evolutionary Algorithms","authors":"Philip Colangelo, Oren Segal, Alexander Speicher, M. Margala","doi":"10.1109/HPEC.2019.8916533","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916533","url":null,"abstract":"Multilayer feed-forward Artificial Neural Networks (ANNs) are universal function approximators capable of modeling measurable functions to any desired degree of accuracy. In practice, designing practical, efficient neural network architectures requires significant effort and expertise. Further, designing efficient neural network architectures that fit optimally on hardware for the benefit of acceleration adds yet another degree of complexity. In this paper, we use Evolutionary Cell Aided Design (ECAD), a framework capable of searching the design spaces for ANN structures and reconfigurable hardware to find solutions based on a set of constraints and fitness functions. Providing a modular and scalable 2D systolic array based machine learning accelerator design built for an Arria 10 GX 1150 FPGA device using OpenCL enables results to be tested and deployed in real hardware. Along with the hardware, a software model of the architecture was developed to speed up the evolutionary process. We present results from the ECAD framework showing the effect various optimizations including accuracy, images per second, effective giga-operations per second, and latency have on both ANN and hardware configurations. Through this work we show that unique solutions can exist for each optimization resulting in the best performance. This work lays the foundation for finding machine learning based solutions for a wide range of applications having different system constraints.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124462108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9