2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing最新文献_第9页

CUDA Dynamic Active Thread List Strategy to Accelerate Debris Flow Simulations 加速泥石流模拟的CUDA动态活动线程列表策略

2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2015-03-04 DOI: 10.1109/PDP.2015.103

G. Filippone, W. Spataro, D. D'Ambrosio, D. Spataro, D. Marocco, G. Trunfio

引用次数: 6

Evaluating the Performance Impact of Communication Imbalance in Sparse Matrix-Vector Multiplication 稀疏矩阵向量乘法中通信不平衡对性能影响的评估

2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2015-03-04 DOI: 10.1109/PDP.2015.37

G. Utrera, Marisa Gil, X. Martorell

引用次数: 3

A Clustered GALS NoC Architecture with Communication-Aware Mapping 具有通信感知映射的集群GALS NoC体系结构

2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2015-03-04 DOI: 10.1109/PDP.2015.113

Kazem Cheshmi, S. Mohammadi, D. Versick, D. Tavangarian, Jelena Trajkovic

{"title":"A Clustered GALS NoC Architecture with Communication-Aware Mapping","authors":"Kazem Cheshmi, S. Mohammadi, D. Versick, D. Tavangarian, Jelena Trajkovic","doi":"10.1109/PDP.2015.113","DOIUrl":"https://doi.org/10.1109/PDP.2015.113","url":null,"abstract":"As processors migrate to multi- and many-core architectures, the role of the communication network becomes more important. Efficient communication architecture can drastically improve overall system performance. Taking into account the application behavior can facilitate system-level solutions that manage the communication cost. To address this issue, we propose a Clustered Globally Asynchronous Locally Synchronous Network-on-Chip (C-GALS NoC) communication architecture. C-GALS NoC is composed of local, synchronous clusters and a global asynchronous network. Additionally, we propose a cluster based communication-aware mapping algorithm (CAM) for mapping the application tasks to the C-GALS NoC, while minimizing the communication cost. The synergy of the C-GLAS NoC and the CAM algorithm results in a system-level mechanism that, according to our results, provides up to 2x and 3x, in performance and power improvement, respectively, in comparison with a regular GALS NoC. Finally, we demonstrate that C-GALS NoC is standard-cell compatible by synthesizing it using Design Compiler.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122547044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

FIST: A Framework to Interleave Spiking Neural Networks on CGRAs 拳头:交错脉冲神经网络在CGRAs上的框架

2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2015-03-04 DOI: 10.1109/PDP.2015.60

Tuan Ngyen, Syed M. A. H. Jafri, M. Daneshtalab, A. Hemani, Sergei Dytckov, J. Plosila, H. Tenhunen

{"title":"FIST: A Framework to Interleave Spiking Neural Networks on CGRAs","authors":"Tuan Ngyen, Syed M. A. H. Jafri, M. Daneshtalab, A. Hemani, Sergei Dytckov, J. Plosila, H. Tenhunen","doi":"10.1109/PDP.2015.60","DOIUrl":"https://doi.org/10.1109/PDP.2015.60","url":null,"abstract":"Coarse Grained Reconfigurable Architectures (CGRAs) are emerging as enabling platforms to meet the high performance demanded by modern embedded applications. In many application domains (e.g. robotics and cognitive embedded systems), the CGRAs are required to simultaneously host processing (e.g. Audio/video acquisition) and estimation (e.g. audio/video/image recognition) tasks. Recent works have revealed that the efficiency and scalability of the estimation algorithms can be significantly improved by using neural networks. However, existing CGRAs commonly employ homogeneous processing resources for both the tasks. To realize the best of both the worlds (conventional processing and neural networks), we present FIST. FIST allows the processing elements and the network to dynamically morph into either conventional CGRA or a neural network, depending on the hosted application. We have chosen the DRRA as a vehicle to study the feasibility and overheads of our approach. Synthesis results reveal that the proposed enhancements incur negligible overheads (4.4% area and 9.1% power) compared to the original DRRA cell.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"27 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120927436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Methodologies and Performance Metrics to Evaluate Multiprogram Workloads 评估多程序工作负载的方法和性能指标

2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2015-03-04 DOI: 10.1109/PDP.2015.74

Vicent Selfa, J. Sahuquillo, Crispín Gómez Requena, M. E. Gómez

{"title":"Methodologies and Performance Metrics to Evaluate Multiprogram Workloads","authors":"Vicent Selfa, J. Sahuquillo, Crispín Gómez Requena, M. E. Gómez","doi":"10.1109/PDP.2015.74","DOIUrl":"https://doi.org/10.1109/PDP.2015.74","url":null,"abstract":"Multicore processors are dominating the microprocessor market and most research work has moved to this kind of processors. Multicore research methods are still immature and evolving from the single-threaded processor ounterparts. Three main research issues must be faced when evaluating performance and energy in multicores. First, multiple simulation methodologies are being applied to evaluate these systems, without being an agreement about which to use. Second, due to the nature of multiprogram workloads new performance metrics are required, different from those used in single-thread processors. Many metrics have been defined and distinct metrics are used across the published works. Finally, multicore processors are really complex systems which require from sophisticated and complementary (e.g. energy and performance) simulators. This paper pursues to help researchers face the three mentioned research issues. For this purpose, we compare these issues across 28 papers published in 2013 in top computer architecture conferences. Both analytical examples and experimental results are presented with the aim of providing some insights in multicore research.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132327539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A Hadoop-Based Framework for Large-Scale Landmine Detection Using Ubiquitous Big Satellite Imaging Data 基于hadoop的无所不在大卫星成像数据大规模地雷探测框架

2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2015-03-04 DOI: 10.1109/PDP.2015.121

S. El-Kazzaz, Ahmed El-Mahdy

引用次数: 2

Marching Band: Fault-Tolerance with Replicable Message Delivery Order 仪仗队:具有可复制消息传递顺序的容错性

2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2015-03-04 DOI: 10.1109/PDP.2015.52

Arkadiusz Danilecki

引用次数: 0

On the Impact of Energy-Efficient Strategies in HPC Clusters 高效能策略对高性能计算集群的影响

2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2015-03-04 DOI: 10.1109/PDP.2015.122

F. Rossi, Miguel G. Xavier, Yuri J. Monti, C. Rose

引用次数: 5

Using Active Data to Provide Smart Data Surveillance to E-Science Users 利用活动数据为电子科学用户提供智能数据监控

2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2015-03-04 DOI: 10.1109/PDP.2015.76

Anthony Simonet, K. Chard, G. Fedak, Ian T Foster

引用次数: 7

An Efficient Algorithm for Communication-Based Task Mapping 基于通信的高效任务映射算法

2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2015-03-04 DOI: 10.1109/PDP.2015.25

E. Cruz, M. Diener, L. Pilla, P. Navaux

{"title":"An Efficient Algorithm for Communication-Based Task Mapping","authors":"E. Cruz, M. Diener, L. Pilla, P. Navaux","doi":"10.1109/PDP.2015.25","DOIUrl":"https://doi.org/10.1109/PDP.2015.25","url":null,"abstract":"The communication between tasks of a parallel application is an important characteristic to consider when mapping tasks to computing cores due to possible differences in communication performance. Within a machine, performance differences are introduced by the memory hierarchy, in which cache memories can be shared by groups of cores and intra-chip interconnections are faster than inter-chip interconnections. In cluster and grid systems, the network imposes an additional communication latency. By mapping tasks that communicate to cores nearby on the memory hierarchy, or to the same nodes in clusters or grids, the communication of parallel applications is optimized, leading to increased performance and energy efficiency. In the task mapping context, one of the most important aspects to be considered is the mapping algorithm, as it determines the improvements that can be achieved. Since the problem of finding the best mapping is NP-Hard, heuristics must be employed to find an approximate solution in feasible time. In this paper, we present Eager Map, a new algorithm to perform communication-based mapping that is based on a greedy grouping strategy applied hierarchically. Experimental evaluation indicates that the execution time of our algorithm is 10 times faster than the state-of-the-art, and presents higher performance improvements. Due to its low execution time and high stability, Eager Map is also suitable for online task mapping, where tasks are migrated during execution.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"2009 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129104103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30