2019 IEEE High Performance Extreme Computing Conference (HPEC)最新文献_第4页

Deep Learning-Based Nuclei Segmentation of Cleared Brain Tissue 基于深度学习的清除脑组织核分割

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916435

Pooya Khorrami, K. Brady, Mark Hernandez, L. Gjesteby, S. Burke, Damon G. Lamb, Matthew A. Melton, K. Otto, L. Brattain

引用次数: 2

Linear Algebra-Based Triangle Counting via Fine-Grained Tasking on Heterogeneous Environments : (Update on Static Graph Challenge) 基于线性代数的三角形计数在异构环境下的细粒度任务处理:(更新静态图挑战)

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916233

Abdurrahman Yasar, S. Rajamanickam, Jonathan W. Berry, Michael M. Wolf, Jeffrey S. Young, Ümit V. Çatalyürek

{"title":"Linear Algebra-Based Triangle Counting via Fine-Grained Tasking on Heterogeneous Environments : (Update on Static Graph Challenge)","authors":"Abdurrahman Yasar, S. Rajamanickam, Jonathan W. Berry, Michael M. Wolf, Jeffrey S. Young, Ümit V. Çatalyürek","doi":"10.1109/HPEC.2019.8916233","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916233","url":null,"abstract":"Triangle counting is a representative graph problem that shows the challenges of improving graph algorithm performance using algorithmic techniques and adopting graph algorithms to new architectures. In this paper, we describe an update to the linear-algebraic formulation of the triangle counting problem. Our new approach relies on fine-grained tasking based on a tile layout. We adopt this task based algorithm to heterogeneous architectures (CPUs and GPUs) for up to 10.8x speed up over past year’s graph challenge submission. This implementation also results in the fastest kernel time known at time of publication for real-world graphs like twitter (3.7 second) and friendster (1.8 seconds) on GPU accelerators when the graph is GPU resident. This is a 1.7 and 1.2 time improvement over previous state-of-the-art triangle counting on GPUs. We also improved end-to-end execution time by overlapping computation and communication of the graph to the GPUs. In terms of end-to-end execution time, our implementation also achieves the fastest end-to-end times due to very low overhead costs.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116448181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Survey of Attacks and Defenses on Edge-Deployed Neural Networks 边缘部署神经网络攻击与防御研究综述

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916519

Mihailo Isakov, V. Gadepally, K. Gettings, M. Kinsy

{"title":"Survey of Attacks and Defenses on Edge-Deployed Neural Networks","authors":"Mihailo Isakov, V. Gadepally, K. Gettings, M. Kinsy","doi":"10.1109/HPEC.2019.8916519","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916519","url":null,"abstract":"Deep Neural Network (DNN) workloads are quickly moving from datacenters onto edge devices, for latency, privacy, or energy reasons. While datacenter networks can be protected using conventional cybersecurity measures, edge neural networks bring a host of new security challenges. Unlike classic IoT applications, edge neural networks are typically very compute and memory intensive, their execution is data-independent, and they are robust to noise and faults. Neural network models may be very expensive to develop, and can potentially reveal information about the private data they were trained on, requiring special care in distribution. The hidden states and outputs of the network can also be used in reconstructing user inputs, potentially violating users’ privacy. Furthermore, neural networks are vulnerable to adversarial attacks, which may cause misclassifications and violate the integrity of the output. These properties add challenges when securing edge-deployed DNNs, requiring new considerations, threat models, priorities, and approaches in securely and privately deploying DNNs to the edge. In this work, we cover the landscape of attacks on, and defenses, of neural networks deployed in edge devices and provide a taxonomy of attacks and defenses targeting edge DNNs.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121849629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Exploration of Fine-Grained Parallelism for Load Balancing Eager K-truss on GPU and CPU GPU和CPU负载均衡的细粒度并行性研究

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916473

Mark P. Blanco, Tze Meng Low, Kyungjoo Kim

引用次数: 16

FPGA-Accelerated Spreading for Global Placement fpga加速全球布局的扩展

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916251

Shounak Dhar, L. Singhal, M. Iyer, D. Pan

{"title":"FPGA-Accelerated Spreading for Global Placement","authors":"Shounak Dhar, L. Singhal, M. Iyer, D. Pan","doi":"10.1109/HPEC.2019.8916251","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916251","url":null,"abstract":"Placement takes a large part of the runtime in an Electronic Design Automation design implementation flow. In modern industrial and academic physical design impementation tools, global placement consumes a significant part of the overall placement runtime. Many of these global placers decouple the placement problem into two main parts - numerical optimization and spreading. In this paper, we propose a new and massively parallel spreading algorithm and also accelerate a part of this algorithm on FPGA. Our algorithm produces placements with comparable quality when integrated into a state-of-the-art academic placer. We formulate the spreading problem as a system of fluid flows across reservoirs and mathematically prove that this formulation produces flows without cycles when solved as a continuous-time system. We also propose a flow correction algorithm to make the flows monotonic, reduce total cell displacement and remove cycles which may arise during the discretization process. Our new flow correction algorithm has a better time complexity for cycle removal than previous algorithms for finding cycles in a generic graph. When compared to our previously published linear programming based spreading algorithm [1], our new fluid-flow based multi-threaded spreading algorithm is 3.44x faster, and the corresponding FPGA-accelerated version is 5.15x faster.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121662223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Automatic Parallelization to Asynchronous Task-Based Runtimes Through a Generic Runtime Layer 通过通用运行时层实现异步任务运行时的自动并行化

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916294

Charles Jin, M. Baskaran, Benoît Meister, J. Springer

{"title":"Automatic Parallelization to Asynchronous Task-Based Runtimes Through a Generic Runtime Layer","authors":"Charles Jin, M. Baskaran, Benoît Meister, J. Springer","doi":"10.1109/HPEC.2019.8916294","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916294","url":null,"abstract":"With the end of Moore’s law, asynchronous task-based parallelism has seen growing support as a parallel programming paradigm, with the runtime system offering such advantages as dynamic load balancing, locality, and scalability. However, there has been a proliferation of such programming systems in recent years, each of which presents different performance tradeoffs and runtime semantics. Developing applications on top of these systems thus requires not only application expertise but also deep familiarity with the runtime, exacerbating the perennial problems of programmability and portability.This work makes three main contributions to this growing landscape. First, we extend a polyhedral optimizing compiler with techniques to extract task-based parallelism and data management for a broad class of asynchronous task-based runtimes. Second, we introduce a generic runtime layer for asynchronous task-based systems with representations of data and tasks that are sparse and tiled by default, which serves as an abstract target for the compiler backend. Finally, we implement this generic layer using OpenMP and Legion, demonstrating the flexibility and viability of the generic layer and delivering an end-to-end path for automatic parallelization to asynchronous task-based runtimes. Using a wide range of applications from deep learning to scientific kernels, we obtain geometric mean speedups of 23.0* (OpenMP) and 9.5* (Legion) using 64 threads.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121664685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Garbled Circuits in the Cloud using FPGA Enabled Nodes 使用FPGA使能节点的云中的乱码电路

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916407

Kai Huang, Mehmet Güngör, Xin Fang, Stratis Ioannidis, M. Leeser

引用次数: 14

Accelerating DNN Inference with GraphBLAS and the GPU 使用GraphBLAS和GPU加速DNN推理

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916498

Xiaoyun Wang, Zhongyi Lin, Carl Yang, John Douglas Owens

引用次数: 11

Heterogeneous Cache Hierarchy Management for Integrated CPU-GPU Architecture 集成CPU-GPU架构的异构缓存层次管理

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916239

Hao Wen, W. Zhang

{"title":"Heterogeneous Cache Hierarchy Management for Integrated CPU-GPU Architecture","authors":"Hao Wen, W. Zhang","doi":"10.1109/HPEC.2019.8916239","DOIUrl":"https://doi.org/10.1109/HPEC.2019.8916239","url":null,"abstract":"Unlike the traditional CPU-GPU heterogeneous architecture where CPU and GPU have separate DRAM and memory address space, current heterogeneous CPU-GPU architectures integrate CPU and GPU in the same die and share the same last level cache (LLC) and memory. For the two-level cache hierarchy in which CPU and GPU have their own private L1 caches but share the LLC, conflict misses in the LLC between CPU and GPU may degrade both CPU and GPU performance. In addition, how the CPU and GPU memory requests flows (write back flow from L1 and cache fill flow from main memory) are managed may impact the performance. In this work, we study three different cache requests flow management policies. The first policy is selective GPU LLC fill, which selectively fills the GPU requests in the LLC. The second policy is selective GPU L1 write back, which selectively writes back GPU blocks in L1 cache to L2 cache. The final policy is a hybrid policy that combines the first two, and selectively replaces CPU blocks in the LLC. Our experimental results indicate that the third policy is the best of these three. On average, it can improve the CPU performance by about 10%, with the highest CPU performance improvement of 22%, with 0.8% averaged GPU performance overhead.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121939566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

DistTC: High Performance Distributed Triangle Counting DistTC:高性能分布式三角形计数

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI: 10.1109/HPEC.2019.8916438

Loc Hoang, Vishwesh Jatala, Xuhao Chen, U. Agarwal, Roshan Dathathri, G. Gill, K. Pingali

引用次数: 25