2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)最新文献

筛选
英文 中文
Dataflow Programming for Stream Processing 流处理的数据流编程
Marcos Paulo Rocha, F. França, A. S. Nery, Leandro S. Guedes
{"title":"Dataflow Programming for Stream Processing","authors":"Marcos Paulo Rocha, F. França, A. S. Nery, Leandro S. Guedes","doi":"10.1109/SBAC-PADW.2017.26","DOIUrl":"https://doi.org/10.1109/SBAC-PADW.2017.26","url":null,"abstract":"Stream processing applications have high-demanding performance requirements that are hard to tackle using traditional parallel models on modern many-core architectures, such as GPUs. On the other hand, recent dataflow computing models can naturally exploit parallelism for a wide class of applications. This work presents an extension to an existing dataflow library for Java. The library extension implements high-level constructs with multiple command queues to enable the superposition of memory operations and kernel executions on GPUs. Experimental results show that significant speedup can be achieved for a subset of well-known stream processing applications: Volume Ray-Casting, Path-Tracing and Sobel Filter.","PeriodicalId":325990,"journal":{"name":"2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122615046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient Pathfinding Co-Processors for FPGAs fpga的高效寻路协处理器
A. S. Nery, A. Sena, Leandro S. Guedes
{"title":"Efficient Pathfinding Co-Processors for FPGAs","authors":"A. S. Nery, A. Sena, Leandro S. Guedes","doi":"10.1109/SBAC-PADW.2017.25","DOIUrl":"https://doi.org/10.1109/SBAC-PADW.2017.25","url":null,"abstract":"Pathfinding algorithms are at the heart of several classes of applications, such as network appliances (routing), GPS navigation and autonomous cars, which are related to recent trends in Artificial Intelligence and Internet of Things (IoT). Moreover, advances in semiconductor miniaturization technologies have enabled the design of efficient Systems-on-Chip (SoC) devices, with demanding performance requirements and energy consumption constraints. Such systems might include Field Programmable Gate Arrays (FPGAs) to allow the design of customized co-processors that yield lower power consumption and higher performance. Therefore, this work aims at designing and evaluating four efficient pathfinding co-processors, each one implementing a different well-known pathfinding algorithm: breadth-first, dijkstra, greedy and a-star. Each co-processor is designed using Xilinx High-Level Synthesis (HLS) compiler and is implemented in the programming logic of a Xilinx FPGA embedded with an ARM microprocessor, which is in charge of controlling the set of co-processors. Extensive performance, circuit-area and energy consumption results shows that each co-processor can efficiently execute a pathfinding algorithm, paving the way for novel dedicated accelerators.","PeriodicalId":325990,"journal":{"name":"2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","volume":"Volume 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124431825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Automatic Scan Parallelization in OpenMP OpenMP中的自动扫描并行化
Maicol Zegarra, M. Pereira, X. Martorell, G. Araújo
{"title":"Automatic Scan Parallelization in OpenMP","authors":"Maicol Zegarra, M. Pereira, X. Martorell, G. Araújo","doi":"10.1109/SBAC-PADW.2017.23","DOIUrl":"https://doi.org/10.1109/SBAC-PADW.2017.23","url":null,"abstract":"Prefix Scan (or simply scan) is an operator that computes all the partial sums of a vector. A scan operation results in a vector where each element is the sum of the preceding elements in the original vector up to the corresponding position. Scan is a key operation in many relevant problems like sorting, lexical analysis, string comparison, image filtering among others. Although there are libraries that provide hand-parallelized implementations of scan in CUDA and OpenCL, no automatic parallelization solution exists for this operator in OpenMP. This paper proposes a new clause for OpenMP which enables the automatic synthesis of the parallel scan. By using the proposed clause a programmer can considerably reduce the complexity of designing scan based algorithms, thus allowing he or she to focus the attention on the problem and not on learning new parallel programming models or languages. Scan was designed in AClang, an open-source LLVM/Clang compiler framework that implements the recently released OpenMP 4.X Accelerator Programming Model. Experiments running a set of typical scan based algorithms on NVIDIA, Intel, and ARM GPUs reveal that the performance of the proposed OpenMP clause is equivalent to that achieved when using OpenCL library calls, with the advantage of a simpler programming complexity.","PeriodicalId":325990,"journal":{"name":"2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132919374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Case Study of Performance Optimization in a Heterogeneous Environment 异构环境下的性能优化案例研究
Leandro Pereira, C. Bentes, Maria Clicia Stelling de Castro, E. Garcia
{"title":"A Case Study of Performance Optimization in a Heterogeneous Environment","authors":"Leandro Pereira, C. Bentes, Maria Clicia Stelling de Castro, E. Garcia","doi":"10.1109/SBAC-PADW.2017.11","DOIUrl":"https://doi.org/10.1109/SBAC-PADW.2017.11","url":null,"abstract":"The optimization of legacy codes for fully exploiting the parallelism opportunities provided by modern heterogeneous architectures is a difficult task. Multiple levels of parallelism can be exploited in order to gain the expected performance. This work describes the lessons learned in the performance optimization of a real-world reservoir engineering application composed of thousands of code lines. We study the exploitation of the multiple levels of parallelism, showing a possible, although non-trivial, path to extract performance. Our results show that exploiting thread-level parallelism is not always the best path to derive performance gains. On the other side, vectorization plays a key role in reducing the execution time of the application.","PeriodicalId":325990,"journal":{"name":"2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","volume":"689 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127684573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HPSM: A Programming Framework for Multi-CPU and Multi-GPU Systems HPSM:一个多cpu和多gpu系统的编程框架
J. F. Lima, D. D. Domenico
{"title":"HPSM: A Programming Framework for Multi-CPU and Multi-GPU Systems","authors":"J. F. Lima, D. D. Domenico","doi":"10.1109/SBAC-PADW.2017.14","DOIUrl":"https://doi.org/10.1109/SBAC-PADW.2017.14","url":null,"abstract":"This paper presents a high-level C++ framework to explore multi-CPU and multi-GPU systems called HPSM. HPSM enables parallel loops and reductions implemented over three parallel backends: Serial, OpenMP (with GCC and libKOMP runtime), and StarPU. We evaluated HPSM development effort with AXPY program, and performance with three parallel benchmarks: N-Body, Hotspot, and CFD solver. The CPU-GPU combination attained better performance than only GPUs for all cases on a CPU-GPU system. Still, our findings provide evidence that NUMA affinity at framework level may produce different results.","PeriodicalId":325990,"journal":{"name":"2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124561873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Communication Protocol for Fog Computing Based on Network Coding Applied to Wireless Sensors 基于网络编码的雾计算通信协议在无线传感器中的应用
B. Marques, I. M. Coelho, A. Sena, M. D. Castro
{"title":"A Communication Protocol for Fog Computing Based on Network Coding Applied to Wireless Sensors","authors":"B. Marques, I. M. Coelho, A. Sena, M. D. Castro","doi":"10.1109/SBAC-PADW.2017.27","DOIUrl":"https://doi.org/10.1109/SBAC-PADW.2017.27","url":null,"abstract":"A communication protocol for fog computing should be efficient, lightweight and customizable. In this work we focus in a communication protocol for fog nodes composed of wireless sensors, which are spatially distributed autonomous sensors monitoring physical or environmental conditions. Problems with data congestion and limited physical resources are common in these networks. For the optimization of data flow, it is important to apply techniques that reduce the transmitted data. We use the network coding technique to demonstrate through experiments the degree of efficiency of data transmission optimization protocols. The experiments were performed through a wireless sensors programming framework composed of TinyOS operating system, NesC programming language and TOSSIM simulator. In addition, we use the Python programming language to simulate the wireless sensor network topology. The results obtained demonstrate a better performance (50% up to 60%) when the network coding technique is applied to the data communication protocol.","PeriodicalId":325990,"journal":{"name":"2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128282177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Assessing Sparse Triangular Linear System Solvers on GPUs 在gpu上评估稀疏三角形线性系统求解器
Daniel Erguiz, Ernesto Dufrechu, P. Ezzatti
{"title":"Assessing Sparse Triangular Linear System Solvers on GPUs","authors":"Daniel Erguiz, Ernesto Dufrechu, P. Ezzatti","doi":"10.1109/SBAC-PADW.2017.15","DOIUrl":"https://doi.org/10.1109/SBAC-PADW.2017.15","url":null,"abstract":"An important number of Numerical Linear Algebra methods to tackle problems in diverse fields of science and engineering, rely heavily on the solution of one or many sparse triangular linear systems. Since the early years, this has motivated numerous efforts that seek to produce efficientimplementations of this kernel for most hardware platforms. However, this operation implies strong data dependencies and unbalanced computations that difficult the concurrency, specially when massively-parallel processors such as GPUs are employed. In this work we review the different techniques to expose the data parallelism in this operation with specialattention to the many-core based proposals. Additionally, we experimentally evaluate the two most successful approaches, namely the routine that is included in CUSPARSE library and the synchronization free method of W. Liu et al. [1]. Finally, we advance in the characterization of the triangular sparse linear systems to select the best solver in each case.","PeriodicalId":325990,"journal":{"name":"2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115658789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Automatic Partitioning of Stencil Computations on Heterogeneous Systems 异构系统中模板计算的自动划分
Alyson D. Pereira, Rodrigo C. O. Rocha, Luiz E. Ramos, M. Castro, L. F. Góes
{"title":"Automatic Partitioning of Stencil Computations on Heterogeneous Systems","authors":"Alyson D. Pereira, Rodrigo C. O. Rocha, Luiz E. Ramos, M. Castro, L. F. Góes","doi":"10.1109/SBAC-PADW.2017.16","DOIUrl":"https://doi.org/10.1109/SBAC-PADW.2017.16","url":null,"abstract":"The stencil pattern is important in many scientific and engineering domains, spurring great interest from researchers and industry. In recent years, various optimizations have been proposed for parallel stencil applications running on GPUs. However, most of the runtime systems that execute those applications often fail to fully utilize the parallelism of modern heterogeneous systems. In this paper, we propose a mechanism based on machine learning that automatically partitions stencil computations across CPU and GPU. We implemented it into the PSkel framework and found that the mechanism can boost the performance of stencil applications on average by 17.9x compared to their sequential CPU-only counterparts, by 1.34x compared to a GPU-only version, and by 1.48x compared to a parallel CPU-only version.","PeriodicalId":325990,"journal":{"name":"2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122793829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信