2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing最新文献

筛选
英文 中文
Heterogeneous Algorithmic Skeletons for Fast Flow with Seamless Coordination over Hybrid Architectures 混合架构下无缝协调快速流的异构算法骨架
M. Goli, H. González-Vélez
{"title":"Heterogeneous Algorithmic Skeletons for Fast Flow with Seamless Coordination over Hybrid Architectures","authors":"M. Goli, H. González-Vélez","doi":"10.1109/PDP.2013.29","DOIUrl":"https://doi.org/10.1109/PDP.2013.29","url":null,"abstract":"Algorithmic skeletons (`skeletons') abstract commonly-used patterns of parallel computation, communication, and interaction. They provide top-down design composition and control inheritance throughout the whole structure. The efficient execution of skeletal applications on a heterogeneous environment has long been of interest to the research community. Arguably, executing a coarse-grained resource-intensive skeletal workloads ought to achieve higher resource utilisation and, ultimately, better job makespan on heterogeneous systems due to the structured parallelism model. This paper presents a heterogeneous OpenCL-based GPU back-end for FastFlow, a widely-used skeletal framework. Our back-end allows the user to easily write any arbitrary OpenCL code inside an heterogeneous algorithmic skeleton and seamlessly control the allocation of OpenCL kernel over the hybrid (CPU/GPU) architecture. Our performance evaluation indicate that a skeletal program which employs our back-end is around one order of magnitude faster than a skeletal parallel program using the traditional homogeneous FastFlow skeletons with the serial version of OpenCL code.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132229345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
CACH-FTL: A Cache-Aware Configurable Hybrid Flash Translation Layer cache - ftl:一个缓存感知的可配置混合闪存转换层
Jalil Boukhobza, Pierre Olivier, S. Rubini
{"title":"CACH-FTL: A Cache-Aware Configurable Hybrid Flash Translation Layer","authors":"Jalil Boukhobza, Pierre Olivier, S. Rubini","doi":"10.1109/PDP.2013.71","DOIUrl":"https://doi.org/10.1109/PDP.2013.71","url":null,"abstract":"Many hybrid Flash Translation Layer (FTL) schemes have been proposed to leverage the erase-before-write and limited lifetime constraints of flash memories. Those schemes try to approach page mapping performance and flexibility while seeking block mapping memory usage. Furthermore, flash-specific cache systems were designed (1) to maximize lifetime by absorbing some erase operations, and (2) to reveal sequentiality from random write operations. Indeed, random writes represent the Achilles' heel of flash memories. Both cache systems and FTL schemes were designed independently from each other. This paper presents a scalable (in terms of mapping table size) and flexible (in terms of I/O workload support) Cache-Aware Configurable Hybrid (CACH) FTL. CACH-FTL uses a common feature of flash-specific cache systems that is flushing groups of pages from the same block. CACH-FTL partitions the flash memory space into two regions: (1) a data Block Mapped Region (BMR) collecting large groups of pages from the above cache (sequential I/Os), and (2) a small Page Mapped over-provisioning Region (PMR) which purpose is to collect/buffer small groups of pages coming from the cache (random I/Os) before moving them to BMR. CACH-FTL is flexible as it offers many configuration possibilities and can be adapted according to the I/O workload. CACH-FTL approaches the ideal page mapping FTL performance as it gives less than 15% performance difference in most cases.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128190315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Design and Evaluation of a Virtual Experimental Environment for Distributed Systems 分布式系统虚拟实验环境的设计与评价
L. Sarzyniec, Tom Buchert, E. Jeanvoine, L. Nussbaum
{"title":"Design and Evaluation of a Virtual Experimental Environment for Distributed Systems","authors":"L. Sarzyniec, Tom Buchert, E. Jeanvoine, L. Nussbaum","doi":"10.1109/PDP.2013.32","DOIUrl":"https://doi.org/10.1109/PDP.2013.32","url":null,"abstract":"Between simulation and experiments on real-scale testbeds, the combined use of emulation and virtualization provide a useful alternative for performing experiments on distributed systems such as clusters, grids, cloud computing or P2P systems. In this paper, we present Distem, a software tool to build distributed virtual experimental environments. Using an homogeneous set of nodes, Distem emulates a platform composed of heterogeneous nodes (in terms of number and performance of CPU cores), connected to a virtual network described using a realistic topology model. Distem relies on LXC (Linux Containers), a low-overhead container-based virtualization solution, to achieve scalability and enable experiments with thousands of virtual nodes. Distem provides a set of user interfaces to accommodate different needs (command-line for interactive use, Ruby and REST APIs), is freely available and well documented. After a detailed description of Distem, we perform an experimental evaluation of several of its features.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134359885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Adaptive and Dynamic Quality-Aware Service Selection 自适应和动态的质量意识服务选择
D. Cavalcanti, F. N. Souza, N. Rosa
{"title":"Adaptive and Dynamic Quality-Aware Service Selection","authors":"D. Cavalcanti, F. N. Souza, N. Rosa","doi":"10.1109/PDP.2013.60","DOIUrl":"https://doi.org/10.1109/PDP.2013.60","url":null,"abstract":"The need for replacing services belonging to a composition is motivated by several reasons, such as changes in the application's requirements, bug fixing, existence of a fresh service and so on. Due to the large number of services having similar (or even identical) functionalities, it has been widely accepted that the selection process for a new service should also take into account non-functional requirements (QoS attributes), such as performance, availability, security and so on. Existing approaches for service selection are usually static and do not consider quality attributes, i.e., they adopt a strategy (ranking algorithm) to rank the candidate services that is usually based on functional aspects and is never altered. In this context, this paper proposes a solution that allows to change the ranking strategy at runtime based on historical data of quality attributes.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122576460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
VisIVO Workflow-Oriented Science Gateway for Astrophysical Visualization 面向工作流的天体物理可视化科学门户
E. Sciacca, M. Bandieramonte, U. Becciani, Alessandro Costa, M. Krokos, P. Massimino, C. Petta, C. Pistagna, S. Riggi, F. Vitello
{"title":"VisIVO Workflow-Oriented Science Gateway for Astrophysical Visualization","authors":"E. Sciacca, M. Bandieramonte, U. Becciani, Alessandro Costa, M. Krokos, P. Massimino, C. Petta, C. Pistagna, S. Riggi, F. Vitello","doi":"10.1109/PDP.2013.31","DOIUrl":"https://doi.org/10.1109/PDP.2013.31","url":null,"abstract":"Nowadays visualization-based knowledge discovery can play an important role in astrophysics. Collaborative visualization can enable multiple users to share visualization experiences, e.g. by interacting simultaneously with astrophysical datasets giving feedback on what other participants are doing/seeing. Further, workflow-driven applications allow reproduction of specific visualization results, a challenging task as selecting suitable visualization parameters may not be a straightforward process. This paper presents VisIVO Science Gateway, a web-based workflow-enabled framework integrating large-scale, multidimensional datasets and applications for visualization and data filtering on Distributed Computing Infrastructures (DCIs). Advanced users are able to create, change, invoke, and monitor workflows while standard users are provided with easy-to-use customised web interfaces hiding all technical aspects of the visualization algorithms and DCI configurations.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125365009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Prediction-Based Dynamic Resource Allocation for Video Transcoding in Cloud Computing 云计算中基于预测的视频转码动态资源分配
F. Jokhio, A. Ashraf, S. Lafond, Ivan Porres, J. Lilius
{"title":"Prediction-Based Dynamic Resource Allocation for Video Transcoding in Cloud Computing","authors":"F. Jokhio, A. Ashraf, S. Lafond, Ivan Porres, J. Lilius","doi":"10.1109/PDP.2013.44","DOIUrl":"https://doi.org/10.1109/PDP.2013.44","url":null,"abstract":"This paper presents prediction-based dynamic resource allocation algorithms to scale video transcoding service on a given Infrastructure as a Service cloud. The proposed algorithms provide mechanisms for allocation and deallocation of virtual machines (VMs) to a cluster of video transcoding servers in a horizontal fashion. We use a two-step load prediction method, which allows proactive resource allocation with high prediction accuracy under real-time constraints. For cost-efficiency, our work supports transcoding of multiple on-demand video streams concurrently on a single VM, resulting in a reduced number of required VMs. We use video segmentation at group of pictures level, which splits video streams into smaller segments that can be transcoded independently of one another. The approach is demonstrated in a discrete-event simulation and an experimental evaluation involving two different load patterns.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116620119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 93
Making Communication a First-Class Citizen in Multicore Partitioning 在多核分区中使通信成为头等公民
Poona Bahrebar, Ruxandra-Marina Florea, W. Heirman, Leon Denis, A. Munteanu, D. Stroobandt
{"title":"Making Communication a First-Class Citizen in Multicore Partitioning","authors":"Poona Bahrebar, Ruxandra-Marina Florea, W. Heirman, Leon Denis, A. Munteanu, D. Stroobandt","doi":"10.1109/PDP.2013.49","DOIUrl":"https://doi.org/10.1109/PDP.2013.49","url":null,"abstract":"Computation-intensive image processing applications need to be implemented on multicore architectures. If they are to be executed efficiently on such platforms, the underlying data and/or functions should be partitioned and distributed among the processors. The optimal partitioning approach is the one which aims to minimize the inter-processor communication while maximizing the load balance. With the continuously increasing number of cores which exacerbates the demand for more complex memory hierarchies, non-uniform memory access, etc., on-chip communication has gained a significant role in taking advantage of the multicore chips. Therefore, making partitioning decisions just based on conventional performance results and without communication profiling is suboptimal. In this paper, we explore the behavior of a mesh decoder as a case study in terms of communication and computation, and propose models that allow early prediction of the application's behavior. Using these models, profiling the application for all of the input samples is not necessary anymore. As a result, communication- and computation-aware parallelization could be performed faster and easier.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125510033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Impact of Message Based Fault Detectors on Applications Messages in a Network on Chip 基于消息的故障检测器对应用程序的影响片上网络中的消息
Arne Garbade, Sebastian Weis, Sebastian Schlingmann, Bernhard Fechner, T. Ungerer
{"title":"Impact of Message Based Fault Detectors on Applications Messages in a Network on Chip","authors":"Arne Garbade, Sebastian Weis, Sebastian Schlingmann, Bernhard Fechner, T. Ungerer","doi":"10.1109/PDP.2013.76","DOIUrl":"https://doi.org/10.1109/PDP.2013.76","url":null,"abstract":"Future many-cores will accommodate a high number of cores, but the tera-scale transistors increases the failure rates in cores and interconnection networks of such chips. Message-based fault detection techniques have been developed to mitigate the influence of faults to the system. In this paper, we investigate the message overhead for fault detection monitoring with decentralized Fault Detection Units in a unified 2D-mesh and assess the resulting delays of application messages. We investigate routing algorithms for different message types and demonstrate 19% reduction of the impact of fault detection messages on application messages. We also show the limitations of prioritized fault detection messages for different application message packet injection rates.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114941399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Parallel Computing of Kernel Density Estimation with Different Multi-core Programming Models 不同多核规划模型下核密度估计的并行计算
Panagiotis D. Michailidis, K. Margaritis
{"title":"Parallel Computing of Kernel Density Estimation with Different Multi-core Programming Models","authors":"Panagiotis D. Michailidis, K. Margaritis","doi":"10.1109/PDP.2013.20","DOIUrl":"https://doi.org/10.1109/PDP.2013.20","url":null,"abstract":"Kernel density estimation is nowadays very popular tool for nonparametric probabilistic density estimation. One of its most important disadvantages is computational complexity of computations needed, especially for large data sets. One way for accelerating these computations is to use the parallel computing with multi-core platforms. In this paper we parallelize two kernel estimation methods such as the univariate and multivariate kernel estimation from the field of the computational econometrics on multi-core platform using different programming frameworks such as Pthreads, OpenMP, Intel Cilk++, Intel TBB, SWARM and FastFlow. The purpose of this paper is to present an extensive quantitative (i.e., performance) and qualitative (i.e., the ease of programming effort) study of the multi-core programming frameworks for these two kernel estimation methods.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121935708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
3D Bubbly Flow Simulation on the GPU - Iterative Solution of a Linear System Using Sub-domain and Level-Set Deflation 基于GPU的三维气泡流模拟——基于子域和水平集压缩的线性系统迭代解
Rohit Gupta, M. Gijzen, C. Vuik
{"title":"3D Bubbly Flow Simulation on the GPU - Iterative Solution of a Linear System Using Sub-domain and Level-Set Deflation","authors":"Rohit Gupta, M. Gijzen, C. Vuik","doi":"10.1109/PDP.2013.58","DOIUrl":"https://doi.org/10.1109/PDP.2013.58","url":null,"abstract":"Solving an ill-conditioned linear system with a two level preconditioned Conjugate Gradient method on the GPU presents many options. The viability of these options is studied for different bubbly flow problems. On the basis of experiments conducted, we propose strategies that make our approach computationally suitable. We use the Truncated Neumann series based preconditioning scheme in combination with Deflation for implementing the two-level preconditioned Conjugate Gradient method and test different configurations on a unit cube with varying number of bubbles. Our results exhibit up to an order of magnitude speedup on the GPU. Our preconditioning scheme combined with deflation proves competitive (in terms of computation time and convergence) when compared to deflation with Incomplete Cholesky preconditioning.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122347368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信