2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing最新文献

筛选
英文 中文
CUDA Dynamic Active Thread List Strategy to Accelerate Debris Flow Simulations 加速泥石流模拟的CUDA动态活动线程列表策略
G. Filippone, W. Spataro, D. D'Ambrosio, D. Spataro, D. Marocco, G. Trunfio
{"title":"CUDA Dynamic Active Thread List Strategy to Accelerate Debris Flow Simulations","authors":"G. Filippone, W. Spataro, D. D'Ambrosio, D. Spataro, D. Marocco, G. Trunfio","doi":"10.1109/PDP.2015.103","DOIUrl":"https://doi.org/10.1109/PDP.2015.103","url":null,"abstract":"Cellular Automata represent a formal frame for dynamical systems which evolve on the base of local interactions. We here present first results of the CUDA parallelization of the SCIDDICA S3-hex Complex Cellular Automata model for simulating debris flows. In particular, a first strategy for the parallelization of the model is based on a straightforward one thread - one cell approach, where each cell in the cellular space is computed by a CUDA thread. A second approach concerns the adoption of a list of CA computational active cells which is handled step by step by an efficient stream compaction algorithm, in order to reduce the excessive use of computationally inactive threads. First results performed on different graphic processors have shown that, by adopting the different CUDA strategies, this kind of hardware can be effective for landslide risk mitigation.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115517692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Evaluating the Performance Impact of Communication Imbalance in Sparse Matrix-Vector Multiplication 稀疏矩阵向量乘法中通信不平衡对性能影响的评估
G. Utrera, Marisa Gil, X. Martorell
{"title":"Evaluating the Performance Impact of Communication Imbalance in Sparse Matrix-Vector Multiplication","authors":"G. Utrera, Marisa Gil, X. Martorell","doi":"10.1109/PDP.2015.37","DOIUrl":"https://doi.org/10.1109/PDP.2015.37","url":null,"abstract":"HPC applications make intensive use of large sparse matrices with the matrix-vector product representing a significant fraction of the total run-time. These matrices are characterized by non-uniform matrix structures and irregular memory accesses that make it difficult to achieve a good scalability in modern HPC platforms with multi-or many-cores, SIMD and high-speed communication networks. One of the reasons for this drawback in scalability is caused by communication due to imbalance in both message synchronization and size. In this work we analyze such load imbalance in the sparse matrix vector product (SpMV) when running in a multi-node cluster using high-speed interconnection networks. The experimental alternatives to diminish communication load imbalance are evaluated on two programming models MPI+fork-join and MPI+task-based parallelism) using certain optimizations (i.e. computation-communication overlap or parallel send messages). The performance achieved for large matrix sizes can be up to 9%.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116460347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Clustered GALS NoC Architecture with Communication-Aware Mapping 具有通信感知映射的集群GALS NoC体系结构
Kazem Cheshmi, S. Mohammadi, D. Versick, D. Tavangarian, Jelena Trajkovic
{"title":"A Clustered GALS NoC Architecture with Communication-Aware Mapping","authors":"Kazem Cheshmi, S. Mohammadi, D. Versick, D. Tavangarian, Jelena Trajkovic","doi":"10.1109/PDP.2015.113","DOIUrl":"https://doi.org/10.1109/PDP.2015.113","url":null,"abstract":"As processors migrate to multi- and many-core architectures, the role of the communication network becomes more important. Efficient communication architecture can drastically improve overall system performance. Taking into account the application behavior can facilitate system-level solutions that manage the communication cost. To address this issue, we propose a Clustered Globally Asynchronous Locally Synchronous Network-on-Chip (C-GALS NoC) communication architecture. C-GALS NoC is composed of local, synchronous clusters and a global asynchronous network. Additionally, we propose a cluster based communication-aware mapping algorithm (CAM) for mapping the application tasks to the C-GALS NoC, while minimizing the communication cost. The synergy of the C-GLAS NoC and the CAM algorithm results in a system-level mechanism that, according to our results, provides up to 2x and 3x, in performance and power improvement, respectively, in comparison with a regular GALS NoC. Finally, we demonstrate that C-GALS NoC is standard-cell compatible by synthesizing it using Design Compiler.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122547044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
FIST: A Framework to Interleave Spiking Neural Networks on CGRAs 拳头:交错脉冲神经网络在CGRAs上的框架
Tuan Ngyen, Syed M. A. H. Jafri, M. Daneshtalab, A. Hemani, Sergei Dytckov, J. Plosila, H. Tenhunen
{"title":"FIST: A Framework to Interleave Spiking Neural Networks on CGRAs","authors":"Tuan Ngyen, Syed M. A. H. Jafri, M. Daneshtalab, A. Hemani, Sergei Dytckov, J. Plosila, H. Tenhunen","doi":"10.1109/PDP.2015.60","DOIUrl":"https://doi.org/10.1109/PDP.2015.60","url":null,"abstract":"Coarse Grained Reconfigurable Architectures (CGRAs) are emerging as enabling platforms to meet the high performance demanded by modern embedded applications. In many application domains (e.g. robotics and cognitive embedded systems), the CGRAs are required to simultaneously host processing (e.g. Audio/video acquisition) and estimation (e.g. audio/video/image recognition) tasks. Recent works have revealed that the efficiency and scalability of the estimation algorithms can be significantly improved by using neural networks. However, existing CGRAs commonly employ homogeneous processing resources for both the tasks. To realize the best of both the worlds (conventional processing and neural networks), we present FIST. FIST allows the processing elements and the network to dynamically morph into either conventional CGRA or a neural network, depending on the hosted application. We have chosen the DRRA as a vehicle to study the feasibility and overheads of our approach. Synthesis results reveal that the proposed enhancements incur negligible overheads (4.4% area and 9.1% power) compared to the original DRRA cell.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"27 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120927436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Methodologies and Performance Metrics to Evaluate Multiprogram Workloads 评估多程序工作负载的方法和性能指标
Vicent Selfa, J. Sahuquillo, Crispín Gómez Requena, M. E. Gómez
{"title":"Methodologies and Performance Metrics to Evaluate Multiprogram Workloads","authors":"Vicent Selfa, J. Sahuquillo, Crispín Gómez Requena, M. E. Gómez","doi":"10.1109/PDP.2015.74","DOIUrl":"https://doi.org/10.1109/PDP.2015.74","url":null,"abstract":"Multicore processors are dominating the microprocessor market and most research work has moved to this kind of processors. Multicore research methods are still immature and evolving from the single-threaded processor ounterparts. Three main research issues must be faced when evaluating performance and energy in multicores. First, multiple simulation methodologies are being applied to evaluate these systems, without being an agreement about which to use. Second, due to the nature of multiprogram workloads new performance metrics are required, different from those used in single-thread processors. Many metrics have been defined and distinct metrics are used across the published works. Finally, multicore processors are really complex systems which require from sophisticated and complementary (e.g. energy and performance) simulators. This paper pursues to help researchers face the three mentioned research issues. For this purpose, we compare these issues across 28 papers published in 2013 in top computer architecture conferences. Both analytical examples and experimental results are presented with the aim of providing some insights in multicore research.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132327539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Hadoop-Based Framework for Large-Scale Landmine Detection Using Ubiquitous Big Satellite Imaging Data 基于hadoop的无所不在大卫星成像数据大规模地雷探测框架
S. El-Kazzaz, Ahmed El-Mahdy
{"title":"A Hadoop-Based Framework for Large-Scale Landmine Detection Using Ubiquitous Big Satellite Imaging Data","authors":"S. El-Kazzaz, Ahmed El-Mahdy","doi":"10.1109/PDP.2015.121","DOIUrl":"https://doi.org/10.1109/PDP.2015.121","url":null,"abstract":"This paper proposes constructing world-wide landmine maps using the free USGS satellite multispectral image archive. Although the available resolution is not suitable for detecting mines (in excess of 100m), we seek to exploit the archive's 40-years worth of earth scans, with same locations appearing hundreds of times, to significantly improve the resolution to a useful scale. This paper proposes a framework, based in iterative map-reduce programming model, for dealing with such 'big image' data. The paper presents an initial study for reconstructing well-known (large) landmarks from the USGS archive, and estimates the computation and space complexities.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122986955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Marching Band: Fault-Tolerance with Replicable Message Delivery Order 仪仗队:具有可复制消息传递顺序的容错性
Arkadiusz Danilecki
{"title":"Marching Band: Fault-Tolerance with Replicable Message Delivery Order","authors":"Arkadiusz Danilecki","doi":"10.1109/PDP.2015.52","DOIUrl":"https://doi.org/10.1109/PDP.2015.52","url":null,"abstract":"Marching Band ensures the same total ordering of message deliveries in each possible execution history, providing replicable execution for a subset of piecewise deterministic applications. With Marching Band any number of failures can be tolerated with a sender-based logging. The main idea behind the algorithm is to log and then broadcast each sent message, with a precomputed tag describing ordering of the message delivery.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117303150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Impact of Energy-Efficient Strategies in HPC Clusters 高效能策略对高性能计算集群的影响
F. Rossi, Miguel G. Xavier, Yuri J. Monti, C. Rose
{"title":"On the Impact of Energy-Efficient Strategies in HPC Clusters","authors":"F. Rossi, Miguel G. Xavier, Yuri J. Monti, C. Rose","doi":"10.1109/PDP.2015.122","DOIUrl":"https://doi.org/10.1109/PDP.2015.122","url":null,"abstract":"Energy-aware management strategies are a recent trend towards achieving energy-efficient computing in HPC clusters. One of the approaches behind those strategies is to apply energy-saving states on idle nodes, alternating them among different sleep states that reflect on many power consumption levels. This paper investigated the way such energy-efficient strategies affected the job turnaround time - the elapsed time between when the job is submitted and when the job is completed, including the wait time as well as the job's actual execution time - in these clusters. Based on the results we proposed a Best-Fit Energy-Aware Strategy that switches the nodes to a sleep state, depending on the throughput of the resource manager's job queue. We simulated the proposed strategy using the SimGrid simulator. Our preliminary results showed a reduction of up to 19% in the overall energy consumption and give us a better understanding of the trade-offs involved in using energy-efficient strategies.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115434533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Using Active Data to Provide Smart Data Surveillance to E-Science Users 利用活动数据为电子科学用户提供智能数据监控
Anthony Simonet, K. Chard, G. Fedak, Ian T Foster
{"title":"Using Active Data to Provide Smart Data Surveillance to E-Science Users","authors":"Anthony Simonet, K. Chard, G. Fedak, Ian T Foster","doi":"10.1109/PDP.2015.76","DOIUrl":"https://doi.org/10.1109/PDP.2015.76","url":null,"abstract":"Modern scientific experiments often involve multiple storage and computing platforms, software tools, and analysis scripts. The resulting heterogeneous environments make data management operations challenging, the significant number of events and the absence of data integration makes it difficult to track data provenance, manage sophisticated analysis processes, and recover from unexpected situations. Current approaches often require costly human intervention and are inherently error prone. The difficulties inherent in managing and manipulating such large and highly distributed datasets also limits automated sharing and collaboration. We study a real world e-Science application involving terabytes of data, using three different analysis and storage platforms, and a number of applications and analysis processes. We demonstrate that using a specialized data life cycle and programming model -- Active Data -- we can easily implement global progress monitoring, and sharing, recover from unexpected events, and automate a range of tasks.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115446384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
An Efficient Algorithm for Communication-Based Task Mapping 基于通信的高效任务映射算法
E. Cruz, M. Diener, L. Pilla, P. Navaux
{"title":"An Efficient Algorithm for Communication-Based Task Mapping","authors":"E. Cruz, M. Diener, L. Pilla, P. Navaux","doi":"10.1109/PDP.2015.25","DOIUrl":"https://doi.org/10.1109/PDP.2015.25","url":null,"abstract":"The communication between tasks of a parallel application is an important characteristic to consider when mapping tasks to computing cores due to possible differences in communication performance. Within a machine, performance differences are introduced by the memory hierarchy, in which cache memories can be shared by groups of cores and intra-chip interconnections are faster than inter-chip interconnections. In cluster and grid systems, the network imposes an additional communication latency. By mapping tasks that communicate to cores nearby on the memory hierarchy, or to the same nodes in clusters or grids, the communication of parallel applications is optimized, leading to increased performance and energy efficiency. In the task mapping context, one of the most important aspects to be considered is the mapping algorithm, as it determines the improvements that can be achieved. Since the problem of finding the best mapping is NP-Hard, heuristics must be employed to find an approximate solution in feasible time. In this paper, we present Eager Map, a new algorithm to perform communication-based mapping that is based on a greedy grouping strategy applied hierarchically. Experimental evaluation indicates that the execution time of our algorithm is 10 times faster than the state-of-the-art, and presents higher performance improvements. Due to its low execution time and high stability, Eager Map is also suitable for online task mapping, where tasks are migrated during execution.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"2009 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129104103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信