2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)最新文献

筛选
英文 中文
CoBaS: Introducing a Component Based Scheduling Framework coba:引入基于组件的调度框架
Anselm Busse, R. Karnapke, Hans-Ulrich Heiß
{"title":"CoBaS: Introducing a Component Based Scheduling Framework","authors":"Anselm Busse, R. Karnapke, Hans-Ulrich Heiß","doi":"10.1109/SBAC-PADW.2015.23","DOIUrl":"https://doi.org/10.1109/SBAC-PADW.2015.23","url":null,"abstract":"Many-Core systems and heterogeneous systems are getting more and more common and may soon enter the mainstream market. To harvest their capabilities to their full potential, the runtime system's scheduling policies have to be adapted and, in many cases, tailored to the specific system. The runtime system can be both an operating system or management infrastructure of an infrastructure as a service (IaaS) platform. Developing, implementing, and testing those scheduling policies is a challenging task in general. In this work we present CoBaS, a component based scheduling framework for multi and many-core runtime systems. The main purpose of CoBaS is the simplification of the scheduling policy implementation and an increased code reuse to save time during development. CoBaS uses a novel approach to reach that goal. It allows the breakdown of the policy implementation into several components that can be reused. Through composition, a fast prototyping, testing and evaluation of new scheduling policies is possible without implementing every functional part again. CoBaS uses an event based approach to distribute information about system states and state changes between the runtime system and components as well as between components themselves. Furthermore, it has a facility to hand over ordered task sets between components. We have adapted both the Linux and Free BSD kernel to use CoBaS by completely removing the native scheduler. The integration of CoBaS into those kernels shows the feasibility of our approach.","PeriodicalId":161685,"journal":{"name":"2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)","volume":"224 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120863302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Single-Loop Approach to 2-D Wavelet Lifting with JPEG 2000 Compatibility 单环二维小波提升方法与JPEG 2000兼容
David Barina, Petr Musil, M. Musil, P. Zemčík
{"title":"Single-Loop Approach to 2-D Wavelet Lifting with JPEG 2000 Compatibility","authors":"David Barina, Petr Musil, M. Musil, P. Zemčík","doi":"10.1109/SBAC-PADW.2015.10","DOIUrl":"https://doi.org/10.1109/SBAC-PADW.2015.10","url":null,"abstract":"A novel approach to 2-D single-loop wavelet lifting with compatibility to JPEG 2000 is presented in this paper. A newly developed 2-D core of CDF 5/3 wavelet filter is presented that, using a new sequence of operations, simplify the design. Moreover, the proposed approach, that uses one pass for 2-D transform, directly produces final output and reduces significantly the need for storing intermediate results into memory. All the proposed structures can be efficiently pipelined in hardware. This paper describes the proposed approach, its implementation in FPGA, cost of such implementation, and brings an experimental evaluation as well as discussion of the features of the approach.","PeriodicalId":161685,"journal":{"name":"2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)","volume":"247 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122530771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
MDACCER: Modified Distributed Assessment of the Closeness CEntrality Ranking in Complex Networks for Massively Parallel Environments MDACCER:大规模并行环境下复杂网络亲密度中心性排序的改进分布式评估
F. L. Cabral, Carla Osthoff, D. Ramos-Castro, Rafael Nardes
{"title":"MDACCER: Modified Distributed Assessment of the Closeness CEntrality Ranking in Complex Networks for Massively Parallel Environments","authors":"F. L. Cabral, Carla Osthoff, D. Ramos-Castro, Rafael Nardes","doi":"10.1109/SBAC-PADW.2015.28","DOIUrl":"https://doi.org/10.1109/SBAC-PADW.2015.28","url":null,"abstract":"We propose a new method derived from DACCER (Distributed Assessment of the Closeness CEntrality Ranking): the modified DACCER (MDACCER), for assessing traditional closeness centrality ranking. MDACCER presents a relaxation that allows it to take advantage of massively parallel environments like General Purpose Graphics Processing Units (GPGPUs). Traditional DACCER proposal assesses Closeness centrality ranking in a limited neighborhood using only information around each node at low computational cost and capability to be executed in a distributed environment. Despite all the advantages, DACCER presents some difficulties in GPGPUs programming model that increases its computational cost at this particular environment. In contrast to the poor performance of DACCER on GPGPUs, experimental results demonstrate MDACCER is as simple and efficient as DACCER to assess Closeness centrality ranking in complex networks and moreover it does not have the same bottlenecks in GPGPUs computing about memory usage and time complexity. We performed MDACCER for some synthetically generated networks, specifically Barabási-Albert ones and results indicate MADCCER correlates Closeness centrality ranking almost as well as DACCER does with lower computational costs.","PeriodicalId":161685,"journal":{"name":"2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132210720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
CHAOS-MCAPI: An Optimized Mechanism to Support Multicore Parallel Programming 混沌- mcapi:支持多核并行编程的优化机制
Antonio Ideguchi, C. E. Morón, M. M. Fernandes
{"title":"CHAOS-MCAPI: An Optimized Mechanism to Support Multicore Parallel Programming","authors":"Antonio Ideguchi, C. E. Morón, M. M. Fernandes","doi":"10.1109/SBAC-PADW.2015.12","DOIUrl":"https://doi.org/10.1109/SBAC-PADW.2015.12","url":null,"abstract":"This paper presents CHAOS-MCAPI (Communication Header and Operating Support-Multicore Communication API), an IPC mechanism targeting parallel programming based on message passing on multicore platforms. The proposed mechanism is built on top of the D-Bus protocol for message transmission, which allows a higher abstraction level and control when compared to lower-level mechanisms such as UNIX Pipes. Optimizations adopted by the implementation of CHAOS-MCAPI resulted in significant performance gains in relation to the original D-Bus implementation, which should be further improved by the adoption of KDBus, a 'zero-copy' mechanism recently made available natively in the Linux Kernel. That should make CHAOS-MCAPI a viable alternative for the design and implementation of parallel programs targeting multicore platforms, both in terms of scalability and programmer's productivity.","PeriodicalId":161685,"journal":{"name":"2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133118290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Painless Parallelism on Heterogeneous Hardware Leveraging the Functional Paradigm 利用功能范式在异构硬件上实现无痛并行
Mauro Blanco, Pablo Perdomo, P. Ezzatti, Alberto Pardo, Marcos Viera
{"title":"Painless Parallelism on Heterogeneous Hardware Leveraging the Functional Paradigm","authors":"Mauro Blanco, Pablo Perdomo, P. Ezzatti, Alberto Pardo, Marcos Viera","doi":"10.1109/SBAC-PADW.2015.24","DOIUrl":"https://doi.org/10.1109/SBAC-PADW.2015.24","url":null,"abstract":"We use a functional framework designed for parallel programming with linear algebra applications to leverage the computing power of heterogeneous hardware. Our work is performed in the context of the pure functional programming language Haskell. The framework allows the manipulation of arbitrary representations for matrices and the definition of multiple implementations of BLAS operations based on different algorithms and parallelism strategies. We perform some benchmarks for representative BLAS operations on three different platforms (multi-core CPU, ARM and GPU), where we apply different parallelism strategies and employ several representations.","PeriodicalId":161685,"journal":{"name":"2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134151707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Hardware Transactional Memory to Enable Speculative Trace Optimization 使用硬件事务性内存启用推测跟踪优化
Juan Salamanca, J. N. Amaral, G. Araújo
{"title":"Using Hardware Transactional Memory to Enable Speculative Trace Optimization","authors":"Juan Salamanca, J. N. Amaral, G. Araújo","doi":"10.1109/SBAC-PADW.2015.13","DOIUrl":"https://doi.org/10.1109/SBAC-PADW.2015.13","url":null,"abstract":"This paper describes a novel speculation technique for the optimization, and simultaneous execution, of multiple alternative traces of hot code regions. This technique, called Speculative Trace Optimization (STO), enumerates, optimizes, and speculatively executes traces of hot loops. It requires hardware support that can be provided in a similar fashion as that available in Hardware Transactional Memory (HTM) systems. This paper discusses the necessary features to support STO, namely multi-versioning, lazy conflict resolution, eager conflict detection, and transaction synchronization. A review of existing HTM architectures - Intel TSX, IBM BG/Q, and IBM POWER8 - shows that none of them have all the features required to implement STO. However, this work demonstrates that STO can be implemented on top of existing HTM architectures through the addition of privatization and pause/resume code. The evaluation of a prototype STO implementation, on top of Intel TSX, using benchmarks from Parboil, Media Bench, and SPEC2006, indicates that STO can yield whole-program speedups of up to 9%. This initial result is promising given that the prototype has significant overhead caused by the code that compensates for TSX absent features. An analysis, included in the paper, suggests that HTM mechanisms have the potential to considerably improve trace performance provided that they efficiently implement the suggested features.","PeriodicalId":161685,"journal":{"name":"2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125149665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Many SVDs on GPU for Image Mosaic Assemble 基于GPU的图像拼接svd
I. Badolato, Luciano de Paula, R. Farias
{"title":"Many SVDs on GPU for Image Mosaic Assemble","authors":"I. Badolato, Luciano de Paula, R. Farias","doi":"10.1109/SBAC-PADW.2015.22","DOIUrl":"https://doi.org/10.1109/SBAC-PADW.2015.22","url":null,"abstract":"In this paper we present a homography algorithm to produce image mosaics using parallelism to solve a multiple Singular Value Decomposition (SVD) system. We analyse four state of art SVD methods and choose the one which better suites the expected size of the matrices derived from the datasets of interest. Then we use cuda to accelerate the solution of the transformation homogeneous matrices.","PeriodicalId":161685,"journal":{"name":"2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130983782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Replicating the Performance Evaluation of an N-Body Application on a Manycore Accelerator 在多核加速器上复制n体应用程序的性能评估
V. G. Pinto, Vinicius Alves Herbstrith, L. Schnorr
{"title":"Replicating the Performance Evaluation of an N-Body Application on a Manycore Accelerator","authors":"V. G. Pinto, Vinicius Alves Herbstrith, L. Schnorr","doi":"10.1109/SBAC-PADW.2015.17","DOIUrl":"https://doi.org/10.1109/SBAC-PADW.2015.17","url":null,"abstract":"Reproducibility for High Performance Computing (HPC) systems has been discussed for some time already, but more work should be carried out to cover the latest accelerators that equip the fastest supercomputers such as the ones listed in Top500. In this paper, we perform a replication of a performance evaluation carried out using an N-Body Open MP parallel application on a XeonPhi accelerator. We also compare the obtained performance with a similar N-Body CUDA application. Besides encountering intriguing results about the Xeon Phi on the number of hardware threads, our comparison against Nvidia boards using the same load shows that the execution Xeon Phi is slower than on Nvidia K20 and GTX760 accelerators.","PeriodicalId":161685,"journal":{"name":"2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132284460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Kanga: A Skeleton-Based Generic Interface for Parallel Programming Kanga:用于并行编程的基于骨架的通用接口
Deives Kist, Bruno Pinto, Rodrigo Bazo, A. R. D. Bois, G. H. Cavalheiro
{"title":"Kanga: A Skeleton-Based Generic Interface for Parallel Programming","authors":"Deives Kist, Bruno Pinto, Rodrigo Bazo, A. R. D. Bois, G. H. Cavalheiro","doi":"10.1109/SBAC-PADW.2015.16","DOIUrl":"https://doi.org/10.1109/SBAC-PADW.2015.16","url":null,"abstract":"Concurrent programming tools strive to exploit hardware resources as much as possible. Nonetheless, the lack of high level abstraction of such tools often require from the user a reasonable amount of knowledge in order to achieve satisfactory performance requirements as well as they do not prevent error prone situations. In this paper we present Kanga, a framework based on the abstractions of skeletons to provide a generic tool that encapsulate many common parallel patterns. Through two case studies we validate the framework implementation.","PeriodicalId":161685,"journal":{"name":"2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126509994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信