2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)最新文献_第4页

A Re-Configurable Ray-Triangle Vector Accelerator for Emerging Fog Architectures 一个可重新配置的光线三角形矢量加速器，用于新兴的雾架构

2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2019-05-01 DOI: 10.1109/IPDPSW.2019.00136

Adrianno Sampaio, A. Sena, A. S. Nery

{"title":"A Re-Configurable Ray-Triangle Vector Accelerator for Emerging Fog Architectures","authors":"Adrianno Sampaio, A. Sena, A. S. Nery","doi":"10.1109/IPDPSW.2019.00136","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00136","url":null,"abstract":"One of the biggest challenges in computer graphics is to produce photo-realistic images from a three-dimensional scene. On one hand, there are fast ways of rendering an image that often cannot portray the light behavior accurately. On the other hand, the most accurate methods, like the Ray-Tracing algorithm, are very costly regarding computing resources and takes a substantial amount of time to render a single frame. Many new techniques were conceived with the purpose of accelerating ray-tracing applications while obtaining results close to the desired. Moreover, Field-Programmable Gate Arrays (FPGAs) have recently become useful not only to prototype novel systems but also to run specialized parallel accelerators to execute the critical path of a given application. Nonetheless, embedded devices with processing capabilities and internet access generate a substantial increase of network traffic against distributed systems and cloud services, stimulating the development of Edge/Fog/In-Situ architectures and technologies. Thus, in this work, we present and analyze a Re-configurable Vector Accelerator specified in High-Level Synthesis (HLS) and the concept of a fog system that may use it. The accelerator is specialized in computing ray-triangle intersections and can be used in a distributed rendering environment. It has been implemented in a Xilinx Kintex Ultrascale FPGA (xcku060-ffva1156-2-e) using Xilinx Vivado tools. Experimental performance and energy consumption results show that the accelerator can efficiently render a simplified version of the Stanford Bunny model using different configurations with 1,2,4 and 8 Vector Cores.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127099300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Simulation Planning Using Component Based Cost Model 基于组件成本模型的仿真规划

2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2019-05-01 DOI: 10.1109/IPDPSW.2019.00116

A. Dubey, S. Chawdhary, J. A. Harris, O. E. Bronson Messer

引用次数: 1

You've Got Mail (YGM): Building Missing Asynchronous Communication Primitives You've Got Mail (YGM):构建丢失的异步通信原语

2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2019-05-01 DOI: 10.1109/IPDPSW.2019.00045

Benjamin W. Priest, Trevor Steil, G. Sanders, R. Pearce

{"title":"You've Got Mail (YGM): Building Missing Asynchronous Communication Primitives","authors":"Benjamin W. Priest, Trevor Steil, G. Sanders, R. Pearce","doi":"10.1109/IPDPSW.2019.00045","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00045","url":null,"abstract":"The Message Passing Interface (MPI) is the de facto standard for message handling in distributed computing. MPI collective communication schemes where many processors communicate with one another depend upon synchronous handshake agreements. This results in applications depending upon iterative collective communications moving at the speed of their slowest processors. We describe a methodology for bootstrapping asynchronous communication primitives to MPI, with an emphasis on irregular and imbalanced all-to-all communication patterns found in many data analytics applications. In such applications, the communication payload between a pair of processors is often small, requiring message aggregation on modern networks. In this work, we develop novel routing schemes that divide routing logically into local and remote routing. In these schemes, each core on a node is responsible for handing all local node sends and/or receives with a subset of remote cores. Collective communications route messages along their designated intermediaries, and are not influenced by the availability of cores not on their route. Unlike conventional synchronous collectives, cores participating in these schemes can enter the protocol when ready and exit once all of their sends and receives are processed. We demonstrate, using simple benchmarks, how this collective communication improves overall wall clock performance, as well as bandwidth and core utilization, for applications with a high demand for arbitrary core-core communication and unequal computational load between cores.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124250631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Efficiently Computing the Power Set in a Parallel Environment 并行环境下功率集的高效计算

2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2019-05-01 DOI: 10.1109/IPDPSW.2019.00100

R. Goodwin

引用次数: 2

Peachy Parallel Assignments (EduPar 2019) 并行作业(EduPar 2019)

2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2019-05-01 DOI: 10.1109/IPDPSW.2019.00064

O. Ozturk, Ben Glick, Jens Mache, David P. Bunde

引用次数: 2

ArrOW: Experiencing a Parallel Cloud-Based De Novo Assembler Workflow 箭头:体验一个并行的基于云的从头组装工作流程

2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2019-05-01 DOI: 10.1109/IPDPSW.2019.00039

Kary A. C. S. Ocaña, Thaylon Guedes, Daniel de Oliveira

引用次数: 0

Teaching Parallel Computing and Dependence Analysis with Python 用Python教授并行计算和相关性分析

2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2019-05-01 DOI: 10.1109/IPDPSW.2019.00061

Neftali Watkinson, Aniket Shivam, A. Nicolau, A. Veidenbaum

{"title":"Teaching Parallel Computing and Dependence Analysis with Python","authors":"Neftali Watkinson, Aniket Shivam, A. Nicolau, A. Veidenbaum","doi":"10.1109/IPDPSW.2019.00061","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00061","url":null,"abstract":"Languages with a high level of abstraction, such as Python, are becoming popular among programmers and are being adopted as the primary programming language in pedagogy. A potential drawback of using such languages is that the architectural aspects, such as data layout in memory, get completely hidden. Therefore, the students have difficulty in understanding advanced computer science topics such as Parallel Computing. Computer architectures have evolved to allow multiple levels of parallelism. From mobile devices to supercomputers, a lot of tasks are done in parallel. Parallel Programming models have become ubiquitous and computer science graduates should know how to take advantage of those models. Therefore, it becomes necessary to expose students to the concepts of parallel programming early in the curriculum. This work describes a lesson plan for teaching Parallel Computing, using Data Dependence analysis and Loop transformations, to Python Programming students. We analyze our teaching experience, evaluation of students' understanding and likelihood of using parallel programming in introductory courses in the future.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131902828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Parallel Decompression of Gzip-Compressed Files and Random Access to DNA Sequences 并行解压缩gzip压缩文件和随机访问DNA序列

2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2019-05-01 DOI: 10.1109/IPDPSW.2019.00042

Mael Kerbiriou, R. Chikhi

引用次数: 9

Delta-Stepping SSSP: From Vertices and Edges to GraphBLAS Implementations 增量步进SSSP:从顶点和边到GraphBLAS实现

2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2019-05-01 DOI: 10.1109/IPDPSW.2019.00047

Upasana Sridhar, Mark P. Blanco, Rahul Mayuranath, Daniele G. Spampinato, Tze Meng Low, Scott McMillan

{"title":"Delta-Stepping SSSP: From Vertices and Edges to GraphBLAS Implementations","authors":"Upasana Sridhar, Mark P. Blanco, Rahul Mayuranath, Daniele G. Spampinato, Tze Meng Low, Scott McMillan","doi":"10.1109/IPDPSW.2019.00047","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00047","url":null,"abstract":"GraphBLAS is an interface for implementing graph algorithms. Algorithms implemented using the GraphBLAS interface are cast in terms of linear algebra-like operations. However, many graph algorithms are canonically described in terms of operations on vertices and/or edges. Despite the known duality between these two representations, the differences in the way algorithms are described using the two approaches can pose considerable difficulties in the adoption of the GraphBLAS as standard interface for development. This paper investigates a systematic approach for translating a graph algorithm described in the canonical vertex and edge representation into an implementation that leverages the GraphBLAS interface. We present a two-step approach to this problem. First, we express common vertex-and edge-centric design patterns using a linear algebraic language. Second, we map this intermediate representation to the GraphBLAS interface. We illustrate our approach by translating the delta-stepping single source shortest path algorithm from its canonical description to a GraphBLAS implementation, and highlight lessons learned when implementing using GraphBLAS.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115585449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Programmable Acceleration for Sparse Matrices in a Data-Movement Limited World 数据移动受限条件下稀疏矩阵的可编程加速

2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2019-05-01 DOI: 10.1109/IPDPSW.2019.00016

Arjun Rawal, Yuanwei Fang, A. Chien

{"title":"Programmable Acceleration for Sparse Matrices in a Data-Movement Limited World","authors":"Arjun Rawal, Yuanwei Fang, A. Chien","doi":"10.1109/IPDPSW.2019.00016","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00016","url":null,"abstract":"Data movement cost is a critical performance concern in today's computing systems. We propose a heterogeneous architecture that combines a CPU core with an efficient data recoding accelerator and evaluate it on sparse matrix computation. Such computations underly a wide range of important computations such as partial differential equation solvers, sequence alignment, and machine learning and are often data movement limited. The data recoding accelerator is orders of magnitude more energy efficient than a conventional CPU for recoding, allowing sparse matrix representation to be optimized for data movement. We evaluate the heterogeneous system with a recoding accelerator using the TAMU sparse matrix library, studying >369 diverse sparse matrix examples finding geometric mean performance benefits of 2.4x. In contrast, CPU's exhibit poor recoding performance (up to 30x worse), making data representation optimization infeasible. Holding SpMV performance constant, adding the recoding optimization and accelerator can produce power reductions of 63% and 51% on DDR and HBM-based memory systems, respectively, when evaluated on a set of 7 representative matrices. These results show the promise of this new heterogeneous architecture approach.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124373705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8