2021 IEEE/ACM Redefining Scalability for Diversely Heterogeneous Architectures Workshop (RSDHA)最新文献

Platform Agnostic Streaming Data Application Performance Models 与平台无关的流数据应用性能模型

2021 IEEE/ACM Redefining Scalability for Diversely Heterogeneous Architectures Workshop (RSDHA) Pub Date : 2021-11-01 DOI: 10.1109/rsdha54838.2021.00008

Clayton J. Faber, Tom Plano, Samatha Kodali, Zhili Xiao, Abhishek Dwaraki, J. Buhler, R. Chamberlain, A. Cabrera

引用次数: 1

[Copyright notice] (版权)

2021 IEEE/ACM Redefining Scalability for Diversely Heterogeneous Architectures Workshop (RSDHA) Pub Date : 2021-11-01 DOI: 10.1109/rsdha54838.2021.00002

引用次数: 0

ELIχR: Eliminating Computation Redundancy in CNN-Based Video Processing ELIχR:消除基于cnn的视频处理中的计算冗余

2021 IEEE/ACM Redefining Scalability for Diversely Heterogeneous Architectures Workshop (RSDHA) Pub Date : 2021-11-01 DOI: 10.1109/rsdha54838.2021.00010

Jordan Schmerge, Daniel Mawhirter, Connor Holmes, Jedidiah McClurg, Bo-Zong Wu

{"title":"ELIχR: Eliminating Computation Redundancy in CNN-Based Video Processing","authors":"Jordan Schmerge, Daniel Mawhirter, Connor Holmes, Jedidiah McClurg, Bo-Zong Wu","doi":"10.1109/rsdha54838.2021.00010","DOIUrl":"https://doi.org/10.1109/rsdha54838.2021.00010","url":null,"abstract":"Video processing frequently relies on applying convolutional neural networks (CNNs) for various tasks, including object tracking, real-time action classification, and image recognition. Due to complicated network design, processing even a single frame requires many operations, leading to low throughput and high latency. This process can be parallelized, but since consecutive images have similar content, most of these operations produce identical results, leading to inefficient usage of parallel hardware accelerators. In this paper, we present ELIχR, a software system that systematically addresses this computation redundancy problem in an architecture-independent way, using two key techniques. First, ELIχR implements a lightweight change propagation algorithm to automatically determine which data to recompute for each new frame based on changes in the input. Second, ELIχR implements a dynamic check to further reduce needed computations by leveraging special operators in the model (e.g., ReLU), and trading off accuracy for performance. We evaluate ELIχR on two real-world models, Inception V3 and Resnet-50, and two video streams. We show that ELIχR running on the CPU produces up to 3.49X speedup (1.76X on average) compared with frame sampling, given the same accuracy and real-time processing requirements, and we describe how our approach can be applied in an architecture-independent way to improve CNN performance in heterogeneous systems.","PeriodicalId":119942,"journal":{"name":"2021 IEEE/ACM Redefining Scalability for Diversely Heterogeneous Architectures Workshop (RSDHA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129088666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Energy Efficient Task Graph Execution Using Compute Unit Masking in GPUs 在gpu中使用计算单元掩蔽的节能任务图执行

2021 IEEE/ACM Redefining Scalability for Diversely Heterogeneous Architectures Workshop (RSDHA) Pub Date : 2021-11-01 DOI: 10.1109/rsdha54838.2021.00011

M. Chow, K. Ranganath, R. Lerias, Mika Shanela Carodan, Daniel Wong

{"title":"Energy Efficient Task Graph Execution Using Compute Unit Masking in GPUs","authors":"M. Chow, K. Ranganath, R. Lerias, Mika Shanela Carodan, Daniel Wong","doi":"10.1109/rsdha54838.2021.00011","DOIUrl":"https://doi.org/10.1109/rsdha54838.2021.00011","url":null,"abstract":"The frontiers of Supercomputers are pushed by novel discrete accelerators. Accelerators such as GPUs are employed to enable faster execution of Machine Learning, Scientific and High-Performance Computing applications. However, it has been harder to gain increased parallelism in traditional workloads. This is why more focus has been into Task Graphs. AMD’s Directed Acyclic Graph Execution Engine (DAGEE) allows the programmer to define a workload in fine-grained tasks, and the system handles the dependencies at the lower-level. We evaluate DAGEE with the Winograd-Strassen Matrix Multiplication algorithm and show that DAGEE achieves on average 15.3% speed up over the traditional matrix multiplication algorithm.While using DAGEE this may increase the contention among kernels due to the increased amount of parallelism. However, AMD allows the programmer to set the number of active Compute Unit (CU) by masking. This fine-grain scaling allows the system software to enable only the required number of Computation Units within a GPU. Using this mechanism we develop a Runtime that masks CU’s for each task during a task graph execution and partitions each task into their separate CU’s, reducing overall contention and energy consumption. We show that our CU Masking runtime on average reduces energy by 18%.","PeriodicalId":119942,"journal":{"name":"2021 IEEE/ACM Redefining Scalability for Diversely Heterogeneous Architectures Workshop (RSDHA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130701513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Multi-accelerator Neural Network Inference in Diversely Heterogeneous Embedded Systems 不同异构嵌入式系统中的多加速器神经网络推理

2021 IEEE/ACM Redefining Scalability for Diversely Heterogeneous Architectures Workshop (RSDHA) Pub Date : 2021-11-01 DOI: 10.1109/rsdha54838.2021.00006

Ismet Dagli, M. Belviranli

{"title":"Multi-accelerator Neural Network Inference in Diversely Heterogeneous Embedded Systems","authors":"Ismet Dagli, M. Belviranli","doi":"10.1109/rsdha54838.2021.00006","DOIUrl":"https://doi.org/10.1109/rsdha54838.2021.00006","url":null,"abstract":"Neural network inference (NNI) is commonly used in mobile and autonomous systems for latency-sensitive critical operations such as obstacle detection and avoidance. In addition to latency, energy consumption is also an important factor in such workloads, since the battery is a limited resource in such systems. Energy and latency demands of critical workload execution in such systems can vary based on the physical system state. For example, the remaining energy on a low-running battery should be prioritized for motor consumption in a quadcopter. On the other hand, if the quadcopter is flying through obstacles, latency-aware execution becomes a priority. Many recent mobile and autonomous system-on-chips embed a diverse range of accelerators with varying power and performance characteristics which can be utilized to achieve this fine trade-off between energy and latency.In this paper, we investigate Multi-accelerator Execution (MAE) on diversely heterogeneous embedded systems, where sub-components of a given workload, such as NNI, can be assigned to different type of accelerators to achieve a desired latency or energy goal. We first analyze the energy and performance characteristics of execution of neural network layers on different type of accelerators. We then explore energy/performance trade-offs via layer-wise scheduling for NNI by considering different layer-to-PE mappings. We finally propose a customizable metric, called multi-accelerator execution gain (MAEG), in order to measure the energy or performance benefits of MAE of a given workload. Our empirical results on Jetson Xavier SoCs show that our methodology can provide up to 28% energy/performance trade-off benefit when compared to the case where all layers are assigned to a single PE.","PeriodicalId":119942,"journal":{"name":"2021 IEEE/ACM Redefining Scalability for Diversely Heterogeneous Architectures Workshop (RSDHA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131100623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Comparing LLC-Memory Traffic between CPU and GPU Architectures 比较CPU和GPU架构之间的LLC-Memory流量

2021 IEEE/ACM Redefining Scalability for Diversely Heterogeneous Architectures Workshop (RSDHA) Pub Date : 2021-11-01 DOI: 10.1109/rsdha54838.2021.00007

Mohammad Alaul Haque Monil, Seyong Lee, J. Vetter, A. Malony

引用次数: 0

Distributed Training for High Resolution Images: A Domain and Spatial Decomposition Approach 高分辨率图像的分布式训练:一种域和空间分解方法

2021 IEEE/ACM Redefining Scalability for Diversely Heterogeneous Architectures Workshop (RSDHA) Pub Date : 2021-09-01 DOI: 10.2172/1827010

A. Tsaris, Jacob D. Hinkle, D. Lunga, P. Dias

引用次数: 1