2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)最新文献_第3页

Assessing Big Data SQL Frameworks for Analyzing Event Logs 评估大数据SQL框架分析事件日志

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-04-04 DOI: 10.1109/PDP.2016.26

Markku Hinkka, Teemu Lehto, Keijo Heljanko

引用次数: 8

X-Ray Computed Tomography Applied to Objects of Cultural Heritage: Porting and Testing the Filtered Back-Projection Reconstruction Algorithm on Low Power Systems-on-Chip x射线计算机断层扫描在文物中的应用:低功耗片上滤波反投影重建算法的移植与测试

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-04-04 DOI: 10.1109/PDP.2016.60

Elena Corni, L. Morganti, M. Morigi, R. Brancaccio, M. Bettuzzi, G. Levi, E. Peccenini, D. Cesini, A. Ferraro

{"title":"X-Ray Computed Tomography Applied to Objects of Cultural Heritage: Porting and Testing the Filtered Back-Projection Reconstruction Algorithm on Low Power Systems-on-Chip","authors":"Elena Corni, L. Morganti, M. Morigi, R. Brancaccio, M. Bettuzzi, G. Levi, E. Peccenini, D. Cesini, A. Ferraro","doi":"10.1109/PDP.2016.60","DOIUrl":"https://doi.org/10.1109/PDP.2016.60","url":null,"abstract":"The embedded and high-performance computing (HPC) sectors, that in the past were completely separated, are now somehow converging under the pressure of two driving forces: the release of less power consuming server processors and the increased performance of the new low power Systems-on-Chip (SoCs) developed to meet the requirements of the demanding mobile market. This convergence allows the porting to low power embedded architectures of applications that were originally confined to traditional HPC systems. In this paper, we present our experience of porting the Filtered Back-projection Algorithm to a low power, low cost system-on-chip, the NVIDIA Tegra K1, which is based on a quad core ARM CPU and on a NVIDIA Kepler GPU. This Filtered Back-projection Algorithm is heavily used in 3D Tomography reconstruction software. The porting has been done exploiting various programming languages (i.e. OpenMP, CUDA) and multiple versions of the application have been developed to exploit both the SoC CPU and GPU. The performances have been measured in terms of 2D slices (of a 3D volume) reconstructed per time unit and per energy unit. The results obtained with all the developed versions are reported and compared with those obtained on a typical x86 HPC node accelerated with a recent NVIDIA GPU. The best performances are achieved combining the OpenMP version and the CUDA version of the algorithm. In particular, we discovered that only three Jetson TK1 boards, equipped with Giga Ethernet interconnections, allow to reconstruct as many images per time unit as a traditional server, using one order of magnitude less energy. The results of this work can be applied for instance to the construction of an energy-efficient computing system of a portable tomographic apparatus.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124337584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Exploring Parallel Implementations of the Bayesian Probabilistic Matrix Factorization 探索贝叶斯概率矩阵分解的并行实现

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-04-04 DOI: 10.1109/PDP.2016.48

Imen Chakroun, Tom Haber, T. Aa, Thomas Kovac

引用次数: 1

Stochastic Thermal Control of a Multicore Real-Time System 多核实时系统的随机热控制

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-04-04 DOI: 10.1109/PDP.2016.44

M. Mohaqeqi, M. Kargahi, K. Fouladi

引用次数: 7

Accelerating Dynamic Fault Tree Analysis Based on Stochastic Logic Utilizing GPGPUs 基于随机逻辑的gpgpu加速动态故障树分析

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-04-04 DOI: 10.1109/PDP.2016.130

Elham Cheshmikhani, H. Zarandi

引用次数: 1

Impact of Memory-Level Parallelism on the Performance of GPU Coherence Protocols 内存级并行性对GPU一致性协议性能的影响

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-04-04 DOI: 10.1109/PDP.2016.67

F. Candel, S. Petit, J. Sahuquillo, J. Duato

{"title":"Impact of Memory-Level Parallelism on the Performance of GPU Coherence Protocols","authors":"F. Candel, S. Petit, J. Sahuquillo, J. Duato","doi":"10.1109/PDP.2016.67","DOIUrl":"https://doi.org/10.1109/PDP.2016.67","url":null,"abstract":"Graphics Processing Units (GPUs) are being implemented in heterogeneous CPU/GPU systems due their high efficiency when executing massively parallel applications. New challenges appear to deal with heterogenous coherence in these systems due to the huge amount (hundreds or thousands) of on-going memory requests of GPUs, which is limited by the Miss Status Holding Register (MSHR) file size associated to the L1 cache. This paper analyzes how the number of MSHRs i) affects to typical memory performance metrics and ii) impacts on the system performance under two recent GPU coherence protocols, called NMOESI and SI (Southern Islands), which introduce distinct coherence traffic. We find two key findings that can help improve the performance of coherence protocols. First, there is a strong correlation between system performance and memory subsystem latency regardless of the used protocol. Second, system performance varies with the number of supported cache misses, however, counterintuitively, supporting more cache misses does not always bring enhanced performance but it can turn into performance drops.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130169776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Conch: A Cyclic MapReduce Model for Iterative Applications Conch:迭代应用的循环MapReduce模型

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-04-04 DOI: 10.1109/PDP.2016.66

Ran Zheng, Genmao Yu, Hai Jin, Xuanhua Shi, Qin Zhang

引用次数: 6

RGBCC: A New Congestion Control Mechanism for InfiniBand RGBCC:一种新的ib拥塞控制机制

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-04-04 DOI: 10.1109/PDP.2016.87

Qian Liu, R. Russell

引用次数: 4

DKPN: A Composite Dataflow/Kahn Process Networks Execution Model DKPN:一个复合数据流/Kahn过程网络执行模型

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-04-04 DOI: 10.1109/PDP.2016.34

P. Arras, D. Fuin, E. Jeannot, Samuel Thibault

引用次数: 5

Simulating Search Protocols in Large-Scale Dynamic Networks 大规模动态网络中搜索协议的模拟

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-04-04 DOI: 10.1109/PDP.2016.74

S. Margariti, V. Dimakopoulos

引用次数: 3