2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing最新文献_第6页

Evaluation of Successive CPUs/APUs/GPUs Based on an OpenCL Finite Difference Stencil 基于OpenCL有限差分模板的连续cpu / apu / gpu评估

2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2013-02-27 DOI: 10.1109/PDP.2013.65

H. Calandra, R. Dolbeau, P. Fortin, J. Lamotte, Issam Said

{"title":"Evaluation of Successive CPUs/APUs/GPUs Based on an OpenCL Finite Difference Stencil","authors":"H. Calandra, R. Dolbeau, P. Fortin, J. Lamotte, Issam Said","doi":"10.1109/PDP.2013.65","DOIUrl":"https://doi.org/10.1109/PDP.2013.65","url":null,"abstract":"The AMD APU (Accelerated Processing Unit) architecture, which combines CPU and GPU cores on the same die, is promising for GPU applications which performance is bottlenecked by the low PCI Express communication rate. However the first APU generations still have different CPU and GPU memory partitions. Currently, the APU integrated GPUs are also less powerful than discrete GPUs. In this paper we therefore investigate the interest of APUs for scientific computing by evaluating and comparing the performance of two successive AMD APUs (family codename Llano and Trinity), two successive discrete GPUs (chip codename Cayman and Tahiti) and one hexa-core AMD CPU. For this purpose, we rely on a 3D finite difference stencil, that is optimized and tuned in OpenCL. We detail the most interesting optimizations for each architecture and show very good performance in OpenCL: up to 500 Gflops on Tahiti. Finally, our results show that APU integrated GPUs outperform CPUs, and that integrated GPUs of upcoming APUs may match discrete GPUs for problems with high communication requirements.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121117028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Block Level Storage Support for Open Source IaaS Clouds 开源IaaS云的块级存储支持

2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2013-02-27 DOI: 10.1109/PDP.2013.45

S. Ács, M. Gergely, P. Kacsuk, M. Kozlovszky

引用次数: 3

Concurrent Collections on Distributed Memory Theory Put into Practice 分布式内存中并发集合理论的实现

2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2013-02-27 DOI: 10.1109/PDP.2013.40

F. Schlimbach, James C. Brodman, K. Knobe

{"title":"Concurrent Collections on Distributed Memory Theory Put into Practice","authors":"F. Schlimbach, James C. Brodman, K. Knobe","doi":"10.1109/PDP.2013.40","DOIUrl":"https://doi.org/10.1109/PDP.2013.40","url":null,"abstract":"Finding and expressing scalable parallelism is a non-trivial task, in fact it is one of the most difficult parts of software development. Concurrent Collections (CnC) is a novel programming model which aims to make this easy. Its higher level abstractions expose available parallelism implicitly through specifying the semantically required dependencies between individual computation kernels. It has been shown conceptually to be deterministic, independent of the target platform and to separate program semantics from tuning. While abstractly evident, there have been no concrete implementations yet which show that these concepts are actually generally exploitable in practice. We developed an implementation of CnC which exposes these benefits in a single model for both shared and distributed memory. Additionally, we provide a tuning interface which allows defining and optimizing distribution plans easily and flexibly. Unlike most approaches, our implementation allows changing the distribution without altering the computation code itself. This makes the development very productive because it separates the concerns of program semantics and tuning. Last but not least, we show that the new mechanisms not only preserve CnC's deterministic model but are also capable of providing competitive performance. We ported several applications and ran them on a cluster of multi-cores. Our results show that CnC performance matches and often outperforms that of existing state-of-the-art models.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116318618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Fault Localizing End-to-End Flow Control Protocol for Networks-on-Chip 基于片上网络的端到端流量控制协议

2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2013-02-27 DOI: 10.1109/PDP.2013.74

G. Schley, N. Batzolis, M. Radetzki

{"title":"Fault Localizing End-to-End Flow Control Protocol for Networks-on-Chip","authors":"G. Schley, N. Batzolis, M. Radetzki","doi":"10.1109/PDP.2013.74","DOIUrl":"https://doi.org/10.1109/PDP.2013.74","url":null,"abstract":"A reliable data exchange between cores of a Network-on-Chip (NoC) is of great importance for correct system behavior. However, data exchange is aggravated by the occurrence of transient and permanent faults in the NoC's communication structure (links). These faults may cause corruption or loss of data which in turn may lead to performance degradation or, in worst case, to complete system failure. In case data is corrupted by a transient fault, a common measure to handle this is to retransmit the data. To ensure that faulty data is retransmitted, so called flow control protocols are applied. In case of permanent faults a simple retransmission is not possible. Permanent faults in e.g. links lead to a permanent corruption of data as long as they are not located. Thus, even retransmissions get corrupted. In this paper we present a fault tolerant end-to-end protocol applicable to arbitrary NoC topologies. It ensures reliable end-to-end communication in presence of transient and permanent faults in the interconnection structure. By means of the protocol's online diagnostic ability, it is capable of locating faulty links and switches without any additional diagnosis hardware.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128531487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Paralysis: An Extensible Multi-tiered Guidance Environment for Program Parallelization and Analysis 麻痹:用于程序并行化和分析的可扩展多层引导环境

2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2013-02-27 DOI: 10.1109/PDP.2013.64

Stuart McCool, Ran Shao, P. Milligan, F. Kurugollu

{"title":"Paralysis: An Extensible Multi-tiered Guidance Environment for Program Parallelization and Analysis","authors":"Stuart McCool, Ran Shao, P. Milligan, F. Kurugollu","doi":"10.1109/PDP.2013.64","DOIUrl":"https://doi.org/10.1109/PDP.2013.64","url":null,"abstract":"The heterogeneous computing revolution continues unabated. Yet despite the vast number of naïve users in possession of bespoke software hoping to embrace the opportunities that this revolution has wrought, few approaches proposed in current literature can guide such users in these efforts. The most appropriate choice would appear to be a (semi-)automating compiler. However, these typically target a single device-type and demand the unguided use of directives. Consequently, they are of little use when naïve users are seeking answers to more fundamental questions, such as: which fragments of a program can/should be parallelized, which device should each fragment target, and what speedup will be attained. To this end, this paper expands on previous work and proposes Paralysis - an extensible guidance environment, tiered for varying programmer competencies with support for static and dynamic analysis techniques. At the highest level, guided user experiences are paramount. At the lowest level, underlying functionality is exposed as a set of plug-ins, ensuring longevity. A partial prototype, built atop the Cetus infrastructure, is described. It is used to analyze two serial programs for CUDA execution - the DFT and the Box Blur Filter. Speedups of 15x and 22x are achieved on the basis of the analysis.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129402315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Impact of Data Structure Layout on Performance 数据结构布局对性能的影响

2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2013-02-27 DOI: 10.1109/PDP.2013.24

Nuno Faria, Rui C. Silva, J. Sobral

引用次数: 17

ReStream - A Replication Algorithm for Reliable and Scalable Multimedia Streaming ReStream -可靠和可扩展的多媒体流的复制算法

2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2013-02-27 DOI: 10.1109/PDP.2013.19

Shabnam Ataee, B. Garbinato, F. Pedone

引用次数: 3

The HPC Testbed of the Italian Grid Infrastructure 意大利电网基础设施的高性能计算试验台

2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2013-02-27 DOI: 10.1109/PDP.2013.42

R. Alfieri, S. Arezzini, G. Barone, U. Becciani, M. Bencivenni, V. Boccia, D. Bottalico, L. Carracciuolo, D. Cesini, A. Ciampa, A. Costantini, S. Cozzini, R. Pietri, M. Drudi, A. Ghiselli, E. Mazzoni, S. Ottani, A. Venturini, P. Veronesi

{"title":"The HPC Testbed of the Italian Grid Infrastructure","authors":"R. Alfieri, S. Arezzini, G. Barone, U. Becciani, M. Bencivenni, V. Boccia, D. Bottalico, L. Carracciuolo, D. Cesini, A. Ciampa, A. Costantini, S. Cozzini, R. Pietri, M. Drudi, A. Ghiselli, E. Mazzoni, S. Ottani, A. Venturini, P. Veronesi","doi":"10.1109/PDP.2013.42","DOIUrl":"https://doi.org/10.1109/PDP.2013.42","url":null,"abstract":"Even though the Italian Grid Infrastructure (IGI) is a general purpose distributed platform, in the past it has been used mainly for serial computations. Parallel applications have been typically executed on supercomputer facilities or, in case of ``not high-end'' HPC applications, on local commodity parallel clusters. Nowadays, with the availability of multiple cores processors, Grid computing is becoming very attractive also for parallel applications but some problems exist in supporting of HPC applications on Grid environment. Here we describe the work made to set up a HPC testbed for ``not high-end'' HPC applications, based on IGI Grid technologies, to find solutions to those problems. Participating sites have been selected among the ones running HPC clusters in Grid environment. Each of them contributed with their specific HPC experience and their available resources to the present test, which encompasses an unprecedented large set of applications from different disciplines in the fields of astronomy, astrophysics, chemistry, climatology, material science and oceanography. In addition to computing resources sharing, the main contribution of each participant was the identification of the real requirements of his application also related to the current middleware limitations and then the realization of a test platform enhanced with additional HPC solutions and configurations developed in a tight collaboration between HPC administrators, users and IGI managers. The main work was on computational resources selection, data management and the definition, the deployment and the documentation of the software execution environment. The outcoming results of the testbed represent the basis of the HPC support in the IGI production infrastructure.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123364055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A GPU Algorithm Design for Resource Constrained Project Scheduling Problem 资源受限项目调度问题的GPU算法设计

2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2013-02-27 DOI: 10.1109/PDP.2013.59

L. Bukata, P. Šůcha

引用次数: 6

High Performance Fault-Tolerant Routing Algorithm for NoC-Based Many-Core Systems 基于noc的多核系统的高性能容错路由算法

2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2013-02-27 DOI: 10.1109/PDP.2013.75

M. Ebrahimi, M. Daneshtalab, J. Plosila

{"title":"High Performance Fault-Tolerant Routing Algorithm for NoC-Based Many-Core Systems","authors":"M. Ebrahimi, M. Daneshtalab, J. Plosila","doi":"10.1109/PDP.2013.75","DOIUrl":"https://doi.org/10.1109/PDP.2013.75","url":null,"abstract":"Networks-on-Chip (NoCs) has become a promising approach for the on-chip communication infrastructure of many-core Systems-on-Chip (SoCs). Faults may occur in the NoC both at the router and link level. There are many fault-tolerant approaches presented both in the off-chip and on-chip networks. Some approaches disable some healthy components in order to form a specific shape and others not. Regardless of all varieties, there has always been a common assumption among them. Most of all traditional fault-tolerant methods are based on rerouting packets around a faulty node or region. These approaches affect the performance significantly not only by taking longer paths but also by creating hotspot around a fault. The focus of this paper is to maintain the performance of NoC in the presence of faults. The presented method takes advantage of a fully adaptive routing algorithm using one and two virtual channels along the X and Y dimensions. This method is able to tolerate all cases of one-faulty node without losing the performance of NoC. According to the experimental results, this presented fault-tolerant routing algorithm is able to support up to six faulty nodes in the 8×8 mesh network by up to 98% reliability.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125683699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 44