2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum最新文献_第8页

An Evaluation of Different I/O Techniques for Checkpoint/Restart 检查点/重新启动的不同I/O技术的评估

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum Pub Date : 2013-05-20 DOI: 10.1109/IPDPSW.2013.145

Faisal Shahzad, M. Wittmann, T. Zeiser, G. Hager, G. Wellein

{"title":"An Evaluation of Different I/O Techniques for Checkpoint/Restart","authors":"Faisal Shahzad, M. Wittmann, T. Zeiser, G. Hager, G. Wellein","doi":"10.1109/IPDPSW.2013.145","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.145","url":null,"abstract":"Today's High Performance Computing (HPC) clusters consist of hundreds of thousands of CPUs, memory units, complex networks, and other components. Such an extreme level of hardware parallelism reduces the mean time to failure (MTTF) of the overall cluster. The future of HPC urgently demands to develop environments that facilitate programs to run successfully even in the presence of failures. Checkpoint/Restart (C/R) is one of the most common techniques to provide fault tolerance. C/R is relatively easy to implement, but typically it introduces significant overhead in the runtime of the application. In this paper, a check pointing technique is presented that significantly reduces the checkpoint overhead and is highly scalable. This is achieved by overlapping the I/O for writing the checkpoint with the computation of the application. For this asynchronous check pointing technique, a theoretical model is developed to estimate the checkpoint overhead. An implementation of this technique is then benchmarked and compared with other check pointing strategies. We show our approach to have marginal overhead as opposite to standard synchronous check pointing for typical application scenarios. A comparison with the node-level check pointing technique by using Scalable Checkpoint/Restart (SCR) library is also presented.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121650576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Harnessing Adaptivity Analysis for the Automatic Design of Efficient Embedded and HPC Systems 利用自适应分析实现高效嵌入式和高性能计算系统的自动设计

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum Pub Date : 2013-05-20 DOI: 10.1109/IPDPSW.2013.230

S. Lovergine, Fabrizio Ferrandi

{"title":"Harnessing Adaptivity Analysis for the Automatic Design of Efficient Embedded and HPC Systems","authors":"S. Lovergine, Fabrizio Ferrandi","doi":"10.1109/IPDPSW.2013.230","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.230","url":null,"abstract":"In the past decades, design methodologies of Embedded Systems (ES) and High Performance Computing (HPC) systems have evolved following different trends. However, they are lately experiencing issues that affect both the domains, whose solutions converge to similar approaches. Examples of issues affecting both the domains are: large parallelism degrees, heterogeneity, power constraints, reliability issues, self-adaptation, and significant programming efforts to reach the desired performance on increasingly complex architectures. Systems able to dynamically adjust their behavior at run-time appear good candidates for the next computing generation, and will most probably condemn non-adaptable systems to rapid extinction. Adaptive systems can deal with uncertain and unpredictable conditions, due, for example, to reliability issues. In this paper we show how we can exploit adaptivity analysis to address several design challenges in embedded systems. The results show an average increase in performance around 34% with respect to state of the art methodology, with a limited area overhead. Furthermore, we discuss our work-in-progress on the exploitation of adaptivity analysis to address new challenges in HPC systems design.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127187997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Network Decontamination from a Black Virus 从黑色病毒中清除网络污染

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum Pub Date : 2013-05-20 DOI: 10.1109/IPDPSW.2013.115

Jie Cai, P. Flocchini, N. Santoro

{"title":"Network Decontamination from a Black Virus","authors":"Jie Cai, P. Flocchini, N. Santoro","doi":"10.1109/IPDPSW.2013.115","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.115","url":null,"abstract":"In this paper, we consider the problem of decontaminating a network from a black virus (BV) using a team of mobile system agents. The BV is a harmful process which, like the extensively studied black hole (BH), destroys any agent arriving at the network site where it resides; when that occurs, unlike a black hole which is static by definition, a BV moves, spreading to all the neighboring sites, thus increasing its presence in the network. If however one of these sites contains a system agent, that clone of the BV is destroyed (i.e., removed permanently from the system). The initial location of the BV is unknown a priori. The objective is to permanently remove any presence of the BVfrom the network with minimum number of site infections (and thus casualties). The main cost measure is the total number of agents needed to solve the problem. This problem integrates in its definition both the harmful aspects of the classical black hole search problem (where however the dangerous elements are static) with the mobility aspects of the classical intruder capture or network decontamination problem(where however there is no danger for the agents). Thus, it is the first attempt to model mobile intruders harmful not only forth sites but also for the agents. We start the study of this problem by focusing on some important classes of interconnection networks: grids, tori, and hypercubes. For each class we present solution protocols and strategies for the team of agents, analyze their worst case complexity, and prove their optimality.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127558585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Scalable Loop Self-Scheduling Schemes Implemented on Large-Scale Clusters 基于大规模集群的可伸缩循环自调度方案

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum Pub Date : 2013-05-20 DOI: 10.1109/IPDPSW.2013.105

Yiming Han, Anthony T. Chronopoulos

引用次数: 13

Efficient and Fault-Tolerant Static Scheduling for Grids 网格的高效和容错静态调度

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum Pub Date : 2013-05-20 DOI: 10.1109/IPDPSW.2013.94

Patrick Cichowski, J. Keller

{"title":"Efficient and Fault-Tolerant Static Scheduling for Grids","authors":"Patrick Cichowski, J. Keller","doi":"10.1109/IPDPSW.2013.94","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.94","url":null,"abstract":"Static task graphs model a variety of parallel applications, and are used to schedule such applications in grid platforms. While the scheduling is static, i.e. done prior to execution, processors might fail or not deliver their performance, especially if the grid comprises nodes with donated time, that may be used or shutdown by their owner at any time. We extend a prior proposal for fault-tolerant grid scheduling with task duplication to also cover situations where tasks take much longer than expected from the schedule as a special kind of fault. Furthermore, we consider the time for communication between dependent tasks when placing duplicates. We evaluate both scenarios with a simulator that injects faults and slowdowns to processors, and workloads from a benchmark suite of task graph with a variety of structures. Our results indicate that the overhead in the fault-free case is negligible, that a processor failure mostly increases the schedule make span only moderately because duplicates can use gapsin the original schedule, and that the effects of a processors lowdown can partly be mitigated by aborting a (slow) task and running its duplicate.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130792549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Designing Hybrid Architectures for Massive-Scale Graph Analysis 设计用于大规模图分析的混合架构

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum Pub Date : 2013-05-20 DOI: 10.1109/IPDPSW.2013.172

David Ediger, David A. Bader

引用次数: 2

Network-on-Chip with Long-Range Wireless Links for High-Throughput Scientific Computation 用于高通量科学计算的具有远程无线链路的片上网络

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum Pub Date : 2013-05-20 DOI: 10.1109/IPDPSW.2013.72

Turbo Majumder, P. Pande, A. Kalyanaraman

{"title":"Network-on-Chip with Long-Range Wireless Links for High-Throughput Scientific Computation","authors":"Turbo Majumder, P. Pande, A. Kalyanaraman","doi":"10.1109/IPDPSW.2013.72","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.72","url":null,"abstract":"Several emerging application domains in scientific computing demand high computation throughputs to achieve terascale or higher performance. Dedicated centers hosting scientific computing tools on a few high-end servers could rely on hardware accelerator co-processors that contain multiple lightweight custom cores interconnected through an on-chip network. While network-on-chip (NoC) driven platforms have been studied in the context of accelerating individual applications, this work studies the efficacy of NoC-based platforms to enhance overall computation throughput in the presence of several concurrently executing jobs. Use of long-range links has been shown to reduce network diameter and we use this property in conjunction with different resource allocation strategies to deliver high throughput. Our experiments using a computational biology application suite as a demonstration study show that the use of long-range wireless shortcuts coupled with the appropriate resource allocation strategy delivers computation throughput over 1011 operations per second, consuming ~0.5 nJ per operation.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131038329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Evaluation of Energy Characteristics of MPI Communication Primitives with RAPL 用RAPL评价MPI通信原语的能量特性

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum Pub Date : 2013-05-20 DOI: 10.1109/IPDPSW.2013.243

Akshay Venkatesh, K. Kandalla, D. Panda

{"title":"Evaluation of Energy Characteristics of MPI Communication Primitives with RAPL","authors":"Akshay Venkatesh, K. Kandalla, D. Panda","doi":"10.1109/IPDPSW.2013.243","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.243","url":null,"abstract":"The energy consumed by modern supercomputing systems continues to grow at an alarming rate. The Message Passing Interface (MPI) has been the de facto programming model for parallel applications and MPI libraries have been designed to achieve the best communication performance on modern architectures. However, the performance and energy trade-offs of these designs have not been studied. Hence, it is critical to understand the energy consumption characteristics of MPI routines and the performance-energy trade-offs of various protocols and designs that are used in MPI libraries. The first hurdle in achieving this objective is to design a framework that can be used to measure energy consumption of various components during communication operations. The RAPL interface allows users to measure energy across various domains on the Intel Sandy-Bridge processor, in a low-overhead, non-intrusive manner. However, this interface has certain limitations and cannot be directly used to measure energy profiles of MPI operations in a fine-grained manner. In this paper, we propose a novel methodology to address these limitations. We propose a new shared-memory window-based solution to accurately measure the aggregate energy consumed by all processes engaged in MPI operations. Using our proposed framework, we demonstrate the impact of various communication protocols and progress mechanisms on the energy consumption. Our evaluations demonstrate that the kernel-based solutions can potentially lead to lower energy consumption for intra-node communication operations. Further, our framework also reveals possible energy bottlenecks in scaling important collective operations, such as, MPI All reduce. In addition, we also use our proposed framework to study the energy consumption characteristics of MPI calls in the NAS-IS benchmark and we infer that the choice of progress mechanism can lead to about 6% energy savings for the processors.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"2 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131398186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Dataset Scaling and MapReduce Performance 数据集缩放和MapReduce性能

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum Pub Date : 2013-05-20 DOI: 10.1109/IPDPSW.2013.143

Fan Zhang, M. Sakr

引用次数: 9

How to Scale Dynamic Tuning to Large Parallel Applications 如何将动态调优扩展到大型并行应用程序

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum Pub Date : 2013-05-20 DOI: 10.1109/IPDPSW.2013.31

Andrea Martínez, A. Sikora, Eduardo César, Joan Sorribes

引用次数: 3