2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing最新文献_第7页

Heterogeneous Algorithmic Skeletons for Fast Flow with Seamless Coordination over Hybrid Architectures 混合架构下无缝协调快速流的异构算法骨架

2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2013-02-27 DOI: 10.1109/PDP.2013.29

M. Goli, H. González-Vélez

{"title":"Heterogeneous Algorithmic Skeletons for Fast Flow with Seamless Coordination over Hybrid Architectures","authors":"M. Goli, H. González-Vélez","doi":"10.1109/PDP.2013.29","DOIUrl":"https://doi.org/10.1109/PDP.2013.29","url":null,"abstract":"Algorithmic skeletons (`skeletons') abstract commonly-used patterns of parallel computation, communication, and interaction. They provide top-down design composition and control inheritance throughout the whole structure. The efficient execution of skeletal applications on a heterogeneous environment has long been of interest to the research community. Arguably, executing a coarse-grained resource-intensive skeletal workloads ought to achieve higher resource utilisation and, ultimately, better job makespan on heterogeneous systems due to the structured parallelism model. This paper presents a heterogeneous OpenCL-based GPU back-end for FastFlow, a widely-used skeletal framework. Our back-end allows the user to easily write any arbitrary OpenCL code inside an heterogeneous algorithmic skeleton and seamlessly control the allocation of OpenCL kernel over the hybrid (CPU/GPU) architecture. Our performance evaluation indicate that a skeletal program which employs our back-end is around one order of magnitude faster than a skeletal parallel program using the traditional homogeneous FastFlow skeletons with the serial version of OpenCL code.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132229345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

CACH-FTL: A Cache-Aware Configurable Hybrid Flash Translation Layer cache - ftl:一个缓存感知的可配置混合闪存转换层

2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2013-02-27 DOI: 10.1109/PDP.2013.71

Jalil Boukhobza, Pierre Olivier, S. Rubini

{"title":"CACH-FTL: A Cache-Aware Configurable Hybrid Flash Translation Layer","authors":"Jalil Boukhobza, Pierre Olivier, S. Rubini","doi":"10.1109/PDP.2013.71","DOIUrl":"https://doi.org/10.1109/PDP.2013.71","url":null,"abstract":"Many hybrid Flash Translation Layer (FTL) schemes have been proposed to leverage the erase-before-write and limited lifetime constraints of flash memories. Those schemes try to approach page mapping performance and flexibility while seeking block mapping memory usage. Furthermore, flash-specific cache systems were designed (1) to maximize lifetime by absorbing some erase operations, and (2) to reveal sequentiality from random write operations. Indeed, random writes represent the Achilles' heel of flash memories. Both cache systems and FTL schemes were designed independently from each other. This paper presents a scalable (in terms of mapping table size) and flexible (in terms of I/O workload support) Cache-Aware Configurable Hybrid (CACH) FTL. CACH-FTL uses a common feature of flash-specific cache systems that is flushing groups of pages from the same block. CACH-FTL partitions the flash memory space into two regions: (1) a data Block Mapped Region (BMR) collecting large groups of pages from the above cache (sequential I/Os), and (2) a small Page Mapped over-provisioning Region (PMR) which purpose is to collect/buffer small groups of pages coming from the cache (random I/Os) before moving them to BMR. CACH-FTL is flexible as it offers many configuration possibilities and can be adapted according to the I/O workload. CACH-FTL approaches the ideal page mapping FTL performance as it gives less than 15% performance difference in most cases.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128190315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Design and Evaluation of a Virtual Experimental Environment for Distributed Systems 分布式系统虚拟实验环境的设计与评价

2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2013-02-27 DOI: 10.1109/PDP.2013.32

L. Sarzyniec, Tom Buchert, E. Jeanvoine, L. Nussbaum

引用次数: 31

Adaptive and Dynamic Quality-Aware Service Selection 自适应和动态的质量意识服务选择

2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2013-02-27 DOI: 10.1109/PDP.2013.60

D. Cavalcanti, F. N. Souza, N. Rosa

引用次数: 3

VisIVO Workflow-Oriented Science Gateway for Astrophysical Visualization 面向工作流的天体物理可视化科学门户

2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2013-02-27 DOI: 10.1109/PDP.2013.31

E. Sciacca, M. Bandieramonte, U. Becciani, Alessandro Costa, M. Krokos, P. Massimino, C. Petta, C. Pistagna, S. Riggi, F. Vitello

引用次数: 10

Prediction-Based Dynamic Resource Allocation for Video Transcoding in Cloud Computing 云计算中基于预测的视频转码动态资源分配

2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2013-02-27 DOI: 10.1109/PDP.2013.44

F. Jokhio, A. Ashraf, S. Lafond, Ivan Porres, J. Lilius

引用次数: 93

Making Communication a First-Class Citizen in Multicore Partitioning 在多核分区中使通信成为头等公民

2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2013-02-27 DOI: 10.1109/PDP.2013.49

Poona Bahrebar, Ruxandra-Marina Florea, W. Heirman, Leon Denis, A. Munteanu, D. Stroobandt

{"title":"Making Communication a First-Class Citizen in Multicore Partitioning","authors":"Poona Bahrebar, Ruxandra-Marina Florea, W. Heirman, Leon Denis, A. Munteanu, D. Stroobandt","doi":"10.1109/PDP.2013.49","DOIUrl":"https://doi.org/10.1109/PDP.2013.49","url":null,"abstract":"Computation-intensive image processing applications need to be implemented on multicore architectures. If they are to be executed efficiently on such platforms, the underlying data and/or functions should be partitioned and distributed among the processors. The optimal partitioning approach is the one which aims to minimize the inter-processor communication while maximizing the load balance. With the continuously increasing number of cores which exacerbates the demand for more complex memory hierarchies, non-uniform memory access, etc., on-chip communication has gained a significant role in taking advantage of the multicore chips. Therefore, making partitioning decisions just based on conventional performance results and without communication profiling is suboptimal. In this paper, we explore the behavior of a mesh decoder as a case study in terms of communication and computation, and propose models that allow early prediction of the application's behavior. Using these models, profiling the application for all of the input samples is not necessary anymore. As a result, communication- and computation-aware parallelization could be performed faster and easier.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125510033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Impact of Message Based Fault Detectors on Applications Messages in a Network on Chip 基于消息的故障检测器对应用程序的影响片上网络中的消息

2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2013-02-27 DOI: 10.1109/PDP.2013.76

Arne Garbade, Sebastian Weis, Sebastian Schlingmann, Bernhard Fechner, T. Ungerer

引用次数: 7

Parallel Computing of Kernel Density Estimation with Different Multi-core Programming Models 不同多核规划模型下核密度估计的并行计算

2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2013-02-27 DOI: 10.1109/PDP.2013.20

Panagiotis D. Michailidis, K. Margaritis

引用次数: 8

3D Bubbly Flow Simulation on the GPU - Iterative Solution of a Linear System Using Sub-domain and Level-Set Deflation 基于GPU的三维气泡流模拟——基于子域和水平集压缩的线性系统迭代解

2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2013-02-27 DOI: 10.1109/PDP.2013.58

Rohit Gupta, M. Gijzen, C. Vuik

引用次数: 5