2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)最新文献_第10页

Improving Inference Latency and Energy of Network-on-Chip based Convolutional Neural Networks through Weights Compression 通过权值压缩提高片上网络卷积神经网络的推理延迟和能量

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00017

G. Ascia, V. Catania, John Jose, Salvatore Monteleone, M. Palesi, Davide Patti

{"title":"Improving Inference Latency and Energy of Network-on-Chip based Convolutional Neural Networks through Weights Compression","authors":"G. Ascia, V. Catania, John Jose, Salvatore Monteleone, M. Palesi, Davide Patti","doi":"10.1109/IPDPSW50202.2020.00017","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00017","url":null,"abstract":"Network-on-Chip (NoC) based Convolutional Neural Network (CNN) accelerators are energy and performance limited by the communication traffic. In fact, to run an inference, the amount of traffic generated both on-chip and off-chip to fetch the parameters of the network, namely, filters and weights, accounts for a large fraction of the energy and latency. This paper presents a technique for compressing the network parameters in such a way to reduce the amount of traffic for fetching the network parameters thus improving the overall performance and energy figures of the accelerator. The lossy nature of the proposed compression technique results in a degradation of the accuracy of the network which we show being, nevertheless, widely justified by the achievable latency and energy consumption improvements. The proposed technique is applied to several widespread CNN models in which the trade-off accuracy vs. inference latency and inference energy is discussed. We show that up to 63% inference latency reduction and 67% inference energy reduction can be achieved with less than 5% top 5 accuracy degradation without the need of retraining the network.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125903255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Teaching Modern Multithreading in CS2 with Actors 用actor教授CS2中的现代多线程

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00061

Mark C. Lewis, Lisa L. Lacher

{"title":"Teaching Modern Multithreading in CS2 with Actors","authors":"Mark C. Lewis, Lisa L. Lacher","doi":"10.1109/IPDPSW50202.2020.00061","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00061","url":null,"abstract":"Explosive growth in multiprocessor computing and the pervasive nature of multicore processors has not only made multithreading and related topics such as parallelism, concurrency, synchronization, etc. an essential part of any undergraduate Computer Science curriculum, it has also lead to the addition of newer constructs to support multithreading in many languages. Not only is it important to motivate student interest in this topic, it is important that they are also educated in current methods used in industry. This can mean an increase in material that needs to be covered. Because of the increase in scope of a CS education, teaching topics in parallel and distributed computing in a hands-on manner is challenging, thus it is valuable for educators to explore different methods of educational delivery in order to best engage their students within the limits of curriculum timelines. The actor model is immensely popular in industry and runs some of the most important software today. In this paper, we describe how we are using Actors as a significant part of the multithreading coverage at the CS2 level, for first-year computer science majors. We also describe a semester-long project that involves the use of these concepts to help solidify student understanding and present student feedback on the project and approach.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126042530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Improving HLS Generated Accelerators Through Relaxed Memory Access Scheduling 通过放宽内存访问调度改进HLS生成的加速器

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00020

Johanna Rohde, Karsten Müller, C. Hochberger

引用次数: 1

Improving MPI Application Communication Time with an Introspection Monitoring Library 利用自省监控库改进MPI应用程序通信时间

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00124

E. Jeannot, Richard Sartori

引用次数: 5

Message from the EduPar-20 Workshop Chairs 来自edupar20工作坊主席的信息

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2020-05-01 DOI: 10.1109/ipdpsw50202.2020.00053

S. Prasad, T. Newhall, David P. Bunde, Martina Barnas, S. Puri

引用次数: 0

Two-Pass Softmax Algorithm 双通道软最大算法

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00074

Marat Dukhan, Artsiom Ablavatski

{"title":"Two-Pass Softmax Algorithm","authors":"Marat Dukhan, Artsiom Ablavatski","doi":"10.1109/IPDPSW50202.2020.00074","DOIUrl":"https://doi.org/10.1109/IPDPSW50202.2020.00074","url":null,"abstract":"The softmax (also called softargmax) function is widely used in machine learning models to normalize real-valued scores into a probability distribution. To avoid floating-point overflow, the softmax function is conventionally implemented in three passes: the first pass to compute the normalization constant, and two other passes to compute outputs from normalized inputs. We analyze two variants of the Three-Pass algorithm and demonstrate that in a well-optimized implementation on HPC-class processors performance of all three passes is limited by memory bandwidth.We then present a novel algorithm for softmax computation in just two passes. The proposed Two-Pass algorithm avoids both numerical overflow and the extra normalization pass by employing an exotic representation for intermediate values, where each value is represented as a pair of floating-point numbers: one representing the “mantissa” and another representing the “exponent”.Performance evaluation demonstrates that on out-of-cache inputs on an Intel Skylake-X processor the new Two-Pass algorithm outperforms the traditional Three-Pass algorithm by up to 28% in AVX512 implementation, and by up to 18% in AVX2 implementation. The proposed Two-Pass algorithm also outperforms the traditional Three-Pass algorithm on Intel Broadwell and AMD Zen 2 processors.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"26 17","pages":"386-395"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141207283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Porting a Legacy CUDA Stencil Code to oneAPI 将遗留的CUDA模板代码移植到一个api

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00070

Steffen Christgau, T. Steinke

引用次数: 20

Analyzing Deep Learning Model Inferences for Image Classification using OpenVINO 基于OpenVINO的图像分类深度学习模型推理分析

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00152

Zheming Jin, H. Finkel

引用次数: 15

Automatic Selection of Tuning Plugins in PTF Using Machine Learning 使用机器学习的PTF自动选择调谐插件

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00069

Robert Mijakovic, M. Gerndt

引用次数: 0

An Efficient Multicore CPU Implementation for Convolution-Pooling Computation in CNNs cnn中卷积池计算的高效多核CPU实现

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2020-05-01 DOI: 10.1109/IPDPSW50202.2020.00097

Hiroki Kataoka, Kohei Yamashita, Yasuaki Ito, K. Nakano, Akihiko Kasagi, T. Tabaru

引用次数: 3