{"title":"Heterogeneous Algorithmic Skeletons for Fast Flow with Seamless Coordination over Hybrid Architectures","authors":"M. Goli, H. González-Vélez","doi":"10.1109/PDP.2013.29","DOIUrl":"https://doi.org/10.1109/PDP.2013.29","url":null,"abstract":"Algorithmic skeletons (`skeletons') abstract commonly-used patterns of parallel computation, communication, and interaction. They provide top-down design composition and control inheritance throughout the whole structure. The efficient execution of skeletal applications on a heterogeneous environment has long been of interest to the research community. Arguably, executing a coarse-grained resource-intensive skeletal workloads ought to achieve higher resource utilisation and, ultimately, better job makespan on heterogeneous systems due to the structured parallelism model. This paper presents a heterogeneous OpenCL-based GPU back-end for FastFlow, a widely-used skeletal framework. Our back-end allows the user to easily write any arbitrary OpenCL code inside an heterogeneous algorithmic skeleton and seamlessly control the allocation of OpenCL kernel over the hybrid (CPU/GPU) architecture. Our performance evaluation indicate that a skeletal program which employs our back-end is around one order of magnitude faster than a skeletal parallel program using the traditional homogeneous FastFlow skeletons with the serial version of OpenCL code.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132229345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CACH-FTL: A Cache-Aware Configurable Hybrid Flash Translation Layer","authors":"Jalil Boukhobza, Pierre Olivier, S. Rubini","doi":"10.1109/PDP.2013.71","DOIUrl":"https://doi.org/10.1109/PDP.2013.71","url":null,"abstract":"Many hybrid Flash Translation Layer (FTL) schemes have been proposed to leverage the erase-before-write and limited lifetime constraints of flash memories. Those schemes try to approach page mapping performance and flexibility while seeking block mapping memory usage. Furthermore, flash-specific cache systems were designed (1) to maximize lifetime by absorbing some erase operations, and (2) to reveal sequentiality from random write operations. Indeed, random writes represent the Achilles' heel of flash memories. Both cache systems and FTL schemes were designed independently from each other. This paper presents a scalable (in terms of mapping table size) and flexible (in terms of I/O workload support) Cache-Aware Configurable Hybrid (CACH) FTL. CACH-FTL uses a common feature of flash-specific cache systems that is flushing groups of pages from the same block. CACH-FTL partitions the flash memory space into two regions: (1) a data Block Mapped Region (BMR) collecting large groups of pages from the above cache (sequential I/Os), and (2) a small Page Mapped over-provisioning Region (PMR) which purpose is to collect/buffer small groups of pages coming from the cache (random I/Os) before moving them to BMR. CACH-FTL is flexible as it offers many configuration possibilities and can be adapted according to the I/O workload. CACH-FTL approaches the ideal page mapping FTL performance as it gives less than 15% performance difference in most cases.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128190315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Sarzyniec, Tom Buchert, E. Jeanvoine, L. Nussbaum
{"title":"Design and Evaluation of a Virtual Experimental Environment for Distributed Systems","authors":"L. Sarzyniec, Tom Buchert, E. Jeanvoine, L. Nussbaum","doi":"10.1109/PDP.2013.32","DOIUrl":"https://doi.org/10.1109/PDP.2013.32","url":null,"abstract":"Between simulation and experiments on real-scale testbeds, the combined use of emulation and virtualization provide a useful alternative for performing experiments on distributed systems such as clusters, grids, cloud computing or P2P systems. In this paper, we present Distem, a software tool to build distributed virtual experimental environments. Using an homogeneous set of nodes, Distem emulates a platform composed of heterogeneous nodes (in terms of number and performance of CPU cores), connected to a virtual network described using a realistic topology model. Distem relies on LXC (Linux Containers), a low-overhead container-based virtualization solution, to achieve scalability and enable experiments with thousands of virtual nodes. Distem provides a set of user interfaces to accommodate different needs (command-line for interactive use, Ruby and REST APIs), is freely available and well documented. After a detailed description of Distem, we perform an experimental evaluation of several of its features.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134359885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive and Dynamic Quality-Aware Service Selection","authors":"D. Cavalcanti, F. N. Souza, N. Rosa","doi":"10.1109/PDP.2013.60","DOIUrl":"https://doi.org/10.1109/PDP.2013.60","url":null,"abstract":"The need for replacing services belonging to a composition is motivated by several reasons, such as changes in the application's requirements, bug fixing, existence of a fresh service and so on. Due to the large number of services having similar (or even identical) functionalities, it has been widely accepted that the selection process for a new service should also take into account non-functional requirements (QoS attributes), such as performance, availability, security and so on. Existing approaches for service selection are usually static and do not consider quality attributes, i.e., they adopt a strategy (ranking algorithm) to rank the candidate services that is usually based on functional aspects and is never altered. In this context, this paper proposes a solution that allows to change the ranking strategy at runtime based on historical data of quality attributes.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122576460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Sciacca, M. Bandieramonte, U. Becciani, Alessandro Costa, M. Krokos, P. Massimino, C. Petta, C. Pistagna, S. Riggi, F. Vitello
{"title":"VisIVO Workflow-Oriented Science Gateway for Astrophysical Visualization","authors":"E. Sciacca, M. Bandieramonte, U. Becciani, Alessandro Costa, M. Krokos, P. Massimino, C. Petta, C. Pistagna, S. Riggi, F. Vitello","doi":"10.1109/PDP.2013.31","DOIUrl":"https://doi.org/10.1109/PDP.2013.31","url":null,"abstract":"Nowadays visualization-based knowledge discovery can play an important role in astrophysics. Collaborative visualization can enable multiple users to share visualization experiences, e.g. by interacting simultaneously with astrophysical datasets giving feedback on what other participants are doing/seeing. Further, workflow-driven applications allow reproduction of specific visualization results, a challenging task as selecting suitable visualization parameters may not be a straightforward process. This paper presents VisIVO Science Gateway, a web-based workflow-enabled framework integrating large-scale, multidimensional datasets and applications for visualization and data filtering on Distributed Computing Infrastructures (DCIs). Advanced users are able to create, change, invoke, and monitor workflows while standard users are provided with easy-to-use customised web interfaces hiding all technical aspects of the visualization algorithms and DCI configurations.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125365009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Jokhio, A. Ashraf, S. Lafond, Ivan Porres, J. Lilius
{"title":"Prediction-Based Dynamic Resource Allocation for Video Transcoding in Cloud Computing","authors":"F. Jokhio, A. Ashraf, S. Lafond, Ivan Porres, J. Lilius","doi":"10.1109/PDP.2013.44","DOIUrl":"https://doi.org/10.1109/PDP.2013.44","url":null,"abstract":"This paper presents prediction-based dynamic resource allocation algorithms to scale video transcoding service on a given Infrastructure as a Service cloud. The proposed algorithms provide mechanisms for allocation and deallocation of virtual machines (VMs) to a cluster of video transcoding servers in a horizontal fashion. We use a two-step load prediction method, which allows proactive resource allocation with high prediction accuracy under real-time constraints. For cost-efficiency, our work supports transcoding of multiple on-demand video streams concurrently on a single VM, resulting in a reduced number of required VMs. We use video segmentation at group of pictures level, which splits video streams into smaller segments that can be transcoded independently of one another. The approach is demonstrated in a discrete-event simulation and an experimental evaluation involving two different load patterns.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116620119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Poona Bahrebar, Ruxandra-Marina Florea, W. Heirman, Leon Denis, A. Munteanu, D. Stroobandt
{"title":"Making Communication a First-Class Citizen in Multicore Partitioning","authors":"Poona Bahrebar, Ruxandra-Marina Florea, W. Heirman, Leon Denis, A. Munteanu, D. Stroobandt","doi":"10.1109/PDP.2013.49","DOIUrl":"https://doi.org/10.1109/PDP.2013.49","url":null,"abstract":"Computation-intensive image processing applications need to be implemented on multicore architectures. If they are to be executed efficiently on such platforms, the underlying data and/or functions should be partitioned and distributed among the processors. The optimal partitioning approach is the one which aims to minimize the inter-processor communication while maximizing the load balance. With the continuously increasing number of cores which exacerbates the demand for more complex memory hierarchies, non-uniform memory access, etc., on-chip communication has gained a significant role in taking advantage of the multicore chips. Therefore, making partitioning decisions just based on conventional performance results and without communication profiling is suboptimal. In this paper, we explore the behavior of a mesh decoder as a case study in terms of communication and computation, and propose models that allow early prediction of the application's behavior. Using these models, profiling the application for all of the input samples is not necessary anymore. As a result, communication- and computation-aware parallelization could be performed faster and easier.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125510033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arne Garbade, Sebastian Weis, Sebastian Schlingmann, Bernhard Fechner, T. Ungerer
{"title":"Impact of Message Based Fault Detectors on Applications Messages in a Network on Chip","authors":"Arne Garbade, Sebastian Weis, Sebastian Schlingmann, Bernhard Fechner, T. Ungerer","doi":"10.1109/PDP.2013.76","DOIUrl":"https://doi.org/10.1109/PDP.2013.76","url":null,"abstract":"Future many-cores will accommodate a high number of cores, but the tera-scale transistors increases the failure rates in cores and interconnection networks of such chips. Message-based fault detection techniques have been developed to mitigate the influence of faults to the system. In this paper, we investigate the message overhead for fault detection monitoring with decentralized Fault Detection Units in a unified 2D-mesh and assess the resulting delays of application messages. We investigate routing algorithms for different message types and demonstrate 19% reduction of the impact of fault detection messages on application messages. We also show the limitations of prioritized fault detection messages for different application message packet injection rates.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114941399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel Computing of Kernel Density Estimation with Different Multi-core Programming Models","authors":"Panagiotis D. Michailidis, K. Margaritis","doi":"10.1109/PDP.2013.20","DOIUrl":"https://doi.org/10.1109/PDP.2013.20","url":null,"abstract":"Kernel density estimation is nowadays very popular tool for nonparametric probabilistic density estimation. One of its most important disadvantages is computational complexity of computations needed, especially for large data sets. One way for accelerating these computations is to use the parallel computing with multi-core platforms. In this paper we parallelize two kernel estimation methods such as the univariate and multivariate kernel estimation from the field of the computational econometrics on multi-core platform using different programming frameworks such as Pthreads, OpenMP, Intel Cilk++, Intel TBB, SWARM and FastFlow. The purpose of this paper is to present an extensive quantitative (i.e., performance) and qualitative (i.e., the ease of programming effort) study of the multi-core programming frameworks for these two kernel estimation methods.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121935708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3D Bubbly Flow Simulation on the GPU - Iterative Solution of a Linear System Using Sub-domain and Level-Set Deflation","authors":"Rohit Gupta, M. Gijzen, C. Vuik","doi":"10.1109/PDP.2013.58","DOIUrl":"https://doi.org/10.1109/PDP.2013.58","url":null,"abstract":"Solving an ill-conditioned linear system with a two level preconditioned Conjugate Gradient method on the GPU presents many options. The viability of these options is studied for different bubbly flow problems. On the basis of experiments conducted, we propose strategies that make our approach computationally suitable. We use the Truncated Neumann series based preconditioning scheme in combination with Deflation for implementing the two-level preconditioned Conjugate Gradient method and test different configurations on a unit cube with varying number of bubbles. Our results exhibit up to an order of magnitude speedup on the GPU. Our preconditioning scheme combined with deflation proves competitive (in terms of computation time and convergence) when compared to deflation with Incomplete Cholesky preconditioning.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122347368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}