2020 IEEE High Performance Extreme Computing Conference (HPEC)最新文献_第6页

Machine Learning Algorithm Performance on the Lucata Computer 机器学习算法在Lucata计算机上的性能

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286158

P. Springer, Thomas Schibler, Géraud Krawezik, J. Lightholder, P. Kogge

引用次数: 2

Dynamic Computational Diversity with Multi-Radix Logic and Memory 基于多基数逻辑和内存的动态计算多样性

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286255

P. Flikkema, James Palmer, Tolga Yalçin, B. Cambou

{"title":"Dynamic Computational Diversity with Multi-Radix Logic and Memory","authors":"P. Flikkema, James Palmer, Tolga Yalçin, B. Cambou","doi":"10.1109/HPEC43674.2020.9286255","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286255","url":null,"abstract":"Today's computing systems are highly vulnerable to attacks, in large part because nearly all computers are part of a hardware and software monoculture of machines in its market, industry or sector. This is of special concern in mission-critical networked systems upon which our civil, industrial, and defense infrastructures increasingly rely. One approach to tackle this challenge is to endow these systems with dynamic computational diversity, wherein each processor assumes a sequence of unique variants, such that it executes only machine code encoded for a variant during the time interval of that variant's existence. The variants are drawn from a very large set, all adhering to a computational diversity architecture, which is based on an underlying instruction set architecture. Thus any population of machines belonging to a specific diversity architecture consists of a temporally dynamic set of essentially-unique variants. However, an underlying ISA enables development of a common development toolchain for the diversity architecture. Our approach is hardware-centric, relying on the rapidly developing microelectronics technologies of ternary computing, resistive RAM (ReRAM) memory, and physical unclonable functions. This paper describes our on-going work in dynamic computational diversity, which targets the principled design of a secure processor for embedded applications.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126726540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Profiling and Optimization of CT Reconstruction on Nvidia Quadro GV100 基于Nvidia Quadro GV100的CT重构分析与优化

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286223

S. Dwivedi, Andreas Heumann

{"title":"Profiling and Optimization of CT Reconstruction on Nvidia Quadro GV100","authors":"S. Dwivedi, Andreas Heumann","doi":"10.1109/HPEC43674.2020.9286223","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286223","url":null,"abstract":"Computed Tomography (CT) Imaging is a widely used technique for medical and industrial applications. Iterative reconstruction algorithms are desired for improved reconstructed image quality and lower dose, but its computational requirements limit its practical usage. Reconstruction toolkit (RTK) is a package of open source GPU accelerated algorithms for CBCT (cone beam computed tomography). GPU based iterative algorithms gives immense acceleration, but it may not be optimized to use the GPU resources efficiently. Nvidia has released several profilers (Nsight-systems, Nsight-compute) to analyze the GPU implementation of an algorithm from compute utilization and memory efficiency perspective. This paper profiles and analyzes the GPU implementation of iterative FDK algorithm in RTK and optimizes it for computation and memory usage on a Quadro GV100 GPU with 32 GB of memory and over 5000 cuda cores. RTK based GPU accelerated iterative FDK when applied on a 4 byte per pixel input projection dataset of size 1.1 GB (512×512×1024) for 20 iterations, to reconstruct a volume of size 440 MB (512×512×441) with 4 byte per pixel, resulted in total runtime of ~11.2 seconds per iteration. Optimized RTK based iterative FDK presented in this paper took ~1.3 seconds per iteration.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122253646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

High-Throughput Image Alignment for Connectomics using Frugal Snap Judgments 使用节俭快速判断的连接组学的高通量图像对齐

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286243

Tim Kaler, Brian Wheatman, Sarah Wooders

{"title":"High-Throughput Image Alignment for Connectomics using Frugal Snap Judgments","authors":"Tim Kaler, Brian Wheatman, Sarah Wooders","doi":"10.1109/HPEC43674.2020.9286243","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286243","url":null,"abstract":"The accuracy and computational efficiency of image alignment directly affects the advancement of connectomics, a field which seeks to understand the structure of the brain through electron microscopy. We introduce the algorithms Quilter and Stacker that are designed to perform 2D and 3D alignment respectively on petabyte-scale data sets from connectomics. Quilter and Stacker are efficient, scalable, and can run on hardware ranging from a researcher's laptop to a large computing cluster. On a single 18-core cloud machine each algorithm achieves throughputs of more than 1 TB/hr; when combined the algorithms produce an end-to-end alignment pipeline that processes data at a rate of 0.82 TB/hr - an over 10x improvement from previous systems. This efficiency comes from both traditional optimizations and from the use of “Frugal Snap Judgments” to judiciously exploit performance-accuracy trade-offs. A high-throughput image-alignment pipeline was implemented using the Quilter and Stacker algorithms and its performance was evaluated using three datasets whose size ranged from 550 GB to 38 TB. The full alignment pipeline achieved a throughput of 0.6-0.8 TB/hr and 1.4-1.5 TB/hr on an 18-core and 112-core shared-memory multicore, respectively. On a supercomputing cluster with 200 nodes and 1600 total cores, the pipeline achieved a throughput of 21.4 TB/hr. We introduce the algorithms Quilter and Stacker that are designed to perform 2D and 3D alignment respectively on petabyte-scale data sets from connectomics. Quilter and Stacker are efficient, scalable, and can run on hardware ranging from a researcher's laptop to a large computing cluster. On a single 18-core cloud machine each algorithm achieves throughputs of more than 1 TB/hr; when combined the algorithms produce an end-to-end alignment pipeline that processes data at a rate of 0.82 TB/hr - an over 10x improvement from previous systems. This efficiency comes from both traditional optimizations and from the use of “Frugal Snap Judgments” to judiciously exploit performance-accuracy trade-offs. A high-throughput image-alignment pipeline was implemented using the Quilter and Stacker algorithms and its performance was evaluated using three datasets whose size ranged from 550 GB to 38 TB. The full alignment pipeline achieved a throughput of 0.6-0.8 TB/hr and 1.4-1.5 TB/hr on an 18-core and 112-core shared-memory multicore, respectively. On a supercomputing cluster with 200 nodes and 1600 total cores, the pipeline achieved a throughput of 21.4 TB/hr. A high-throughput image-alignment pipeline was implemented using the Quilter and Stacker algorithms and its performance was evaluated using three datasets whose size ranged from 550 GB to 38 TB. The full alignment pipeline achieved a throughput of 0.6-0.8 TB/hr and 1.4-1.5 TB/hr on an 18-core and 112-core shared-memory multicore, respectively. On a supercomputing cluster with 200 nodes and 1600 total cores, the pipeline achieved a throughput of 21.4 TB/hr.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128209177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Scalable Architecture for CNN Accelerators Leveraging High-Performance Memories 利用高性能存储器的CNN加速器的可扩展架构

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286162

Maarten Hattink, G. D. Guglielmo, L. Carloni, K. Bergman

引用次数: 1

Evaluating SEU Resilience of CNNs with Fault Injection 基于故障注入的cnn的SEU弹性评估

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286168

Evan T. Kain, Tyler M. Lovelly, A. George

{"title":"Evaluating SEU Resilience of CNNs with Fault Injection","authors":"Evan T. Kain, Tyler M. Lovelly, A. George","doi":"10.1109/HPEC43674.2020.9286168","DOIUrl":"https://doi.org/10.1109/HPEC43674.2020.9286168","url":null,"abstract":"Convolutional neural networks (CNNs) are quickly growing as a solution for advanced image processing in many mission-critical high-performance and embedded computing systems ranging from supercomputers and data centers to aircraft and spacecraft. However, the systems running CNNs are increasingly susceptible to single-event upsets (SEUs) which are bit flips that result from charged particle strikes. To better understand how to mitigate the effects of SEUs on CNNs, the behavior of CNNs when exposed to SEUs must be better understood. Software fault-injection tools allow us to emulate SEUs to analyze the effects of various CNN architectures and input data features on overall resilience. Fault injection on three combinations of CNNs and datasets yielded insights into their behavior. When focusing on a threshold of 1% error in classification accuracy, more complex CNNs tended to be less resilient to SEUs, and easier classification tasks on well-clustered input data were more resilient to SEUs. Overall, the number of bits flipped to reach this threshold ranged from 20 to 3,790 bits. Results demonstrate that CNNs are highly resilient to SEUs, but the complexity of the CNN and difficulty of the classification task will decrease that resilience.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133298381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Arithmetic and Boolean Secret Sharing MPC on FPGAs in the Data Center 数据中心fpga上的算术和布尔秘密共享MPC

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286159

Rushi Patel, Pierre-Francois W. Wolfe, Robert Munafo, Mayank Varia, Martin C. Herbordt

引用次数: 3

Chip-to-chip Optical Data Communications using Polarization Division Multiplexing 使用偏振分复用的片对片光数据通信

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286227

D. Ivanovich, Chenfeng Zhao, Xuan Zhang, R. Chamberlain, A. Deliwala, V. Gruev

引用次数: 2

Post Quantum Cryptography(PQC) - An overview: (Invited Paper) 后量子密码学(PQC) -概述:(特邀论文)

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286147

M. Kumar, P. Pattnaik

引用次数: 11

Design and Performance Evaluation of Optimizations for OpenCL FPGA Kernels OpenCL FPGA内核优化设计与性能评估

2020 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2020-09-22 DOI: 10.1109/HPEC43674.2020.9286221

A. Cabrera, R. Chamberlain

引用次数: 2