2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum最新文献

筛选
英文 中文
An Evaluation of Different I/O Techniques for Checkpoint/Restart 检查点/重新启动的不同I/O技术的评估
Faisal Shahzad, M. Wittmann, T. Zeiser, G. Hager, G. Wellein
{"title":"An Evaluation of Different I/O Techniques for Checkpoint/Restart","authors":"Faisal Shahzad, M. Wittmann, T. Zeiser, G. Hager, G. Wellein","doi":"10.1109/IPDPSW.2013.145","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.145","url":null,"abstract":"Today's High Performance Computing (HPC) clusters consist of hundreds of thousands of CPUs, memory units, complex networks, and other components. Such an extreme level of hardware parallelism reduces the mean time to failure (MTTF) of the overall cluster. The future of HPC urgently demands to develop environments that facilitate programs to run successfully even in the presence of failures. Checkpoint/Restart (C/R) is one of the most common techniques to provide fault tolerance. C/R is relatively easy to implement, but typically it introduces significant overhead in the runtime of the application. In this paper, a check pointing technique is presented that significantly reduces the checkpoint overhead and is highly scalable. This is achieved by overlapping the I/O for writing the checkpoint with the computation of the application. For this asynchronous check pointing technique, a theoretical model is developed to estimate the checkpoint overhead. An implementation of this technique is then benchmarked and compared with other check pointing strategies. We show our approach to have marginal overhead as opposite to standard synchronous check pointing for typical application scenarios. A comparison with the node-level check pointing technique by using Scalable Checkpoint/Restart (SCR) library is also presented.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121650576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Harnessing Adaptivity Analysis for the Automatic Design of Efficient Embedded and HPC Systems 利用自适应分析实现高效嵌入式和高性能计算系统的自动设计
S. Lovergine, Fabrizio Ferrandi
{"title":"Harnessing Adaptivity Analysis for the Automatic Design of Efficient Embedded and HPC Systems","authors":"S. Lovergine, Fabrizio Ferrandi","doi":"10.1109/IPDPSW.2013.230","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.230","url":null,"abstract":"In the past decades, design methodologies of Embedded Systems (ES) and High Performance Computing (HPC) systems have evolved following different trends. However, they are lately experiencing issues that affect both the domains, whose solutions converge to similar approaches. Examples of issues affecting both the domains are: large parallelism degrees, heterogeneity, power constraints, reliability issues, self-adaptation, and significant programming efforts to reach the desired performance on increasingly complex architectures. Systems able to dynamically adjust their behavior at run-time appear good candidates for the next computing generation, and will most probably condemn non-adaptable systems to rapid extinction. Adaptive systems can deal with uncertain and unpredictable conditions, due, for example, to reliability issues. In this paper we show how we can exploit adaptivity analysis to address several design challenges in embedded systems. The results show an average increase in performance around 34% with respect to state of the art methodology, with a limited area overhead. Furthermore, we discuss our work-in-progress on the exploitation of adaptivity analysis to address new challenges in HPC systems design.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127187997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Network Decontamination from a Black Virus 从黑色病毒中清除网络污染
Jie Cai, P. Flocchini, N. Santoro
{"title":"Network Decontamination from a Black Virus","authors":"Jie Cai, P. Flocchini, N. Santoro","doi":"10.1109/IPDPSW.2013.115","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.115","url":null,"abstract":"In this paper, we consider the problem of decontaminating a network from a black virus (BV) using a team of mobile system agents. The BV is a harmful process which, like the extensively studied black hole (BH), destroys any agent arriving at the network site where it resides; when that occurs, unlike a black hole which is static by definition, a BV moves, spreading to all the neighboring sites, thus increasing its presence in the network. If however one of these sites contains a system agent, that clone of the BV is destroyed (i.e., removed permanently from the system). The initial location of the BV is unknown a priori. The objective is to permanently remove any presence of the BVfrom the network with minimum number of site infections (and thus casualties). The main cost measure is the total number of agents needed to solve the problem. This problem integrates in its definition both the harmful aspects of the classical black hole search problem (where however the dangerous elements are static) with the mobility aspects of the classical intruder capture or network decontamination problem(where however there is no danger for the agents). Thus, it is the first attempt to model mobile intruders harmful not only forth sites but also for the agents. We start the study of this problem by focusing on some important classes of interconnection networks: grids, tori, and hypercubes. For each class we present solution protocols and strategies for the team of agents, analyze their worst case complexity, and prove their optimality.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127558585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Scalable Loop Self-Scheduling Schemes Implemented on Large-Scale Clusters 基于大规模集群的可伸缩循环自调度方案
Yiming Han, Anthony T. Chronopoulos
{"title":"Scalable Loop Self-Scheduling Schemes Implemented on Large-Scale Clusters","authors":"Yiming Han, Anthony T. Chronopoulos","doi":"10.1109/IPDPSW.2013.105","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.105","url":null,"abstract":"Loops are the largest source of parallelism in many scientific applications. Parallelization of irregular loop applications is a challenging problem to achieve scalable performance on large-scale multi-core clusters. Previous research proposed an effective Master-Worker model on clusters for distributed self scheduling schemes that apply to parallel loops with independent iterations. However, this model has not been applied to large-scale clusters. In this paper, we present an extension of the distributed self-scheduling schemes implemented in a hierarchical Master-Worker model. Our experiments with different self-scheduling schemes demonstrate good scalability when scaling up to 8, 192processors.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133651586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Efficient and Fault-Tolerant Static Scheduling for Grids 网格的高效和容错静态调度
Patrick Cichowski, J. Keller
{"title":"Efficient and Fault-Tolerant Static Scheduling for Grids","authors":"Patrick Cichowski, J. Keller","doi":"10.1109/IPDPSW.2013.94","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.94","url":null,"abstract":"Static task graphs model a variety of parallel applications, and are used to schedule such applications in grid platforms. While the scheduling is static, i.e. done prior to execution, processors might fail or not deliver their performance, especially if the grid comprises nodes with donated time, that may be used or shutdown by their owner at any time. We extend a prior proposal for fault-tolerant grid scheduling with task duplication to also cover situations where tasks take much longer than expected from the schedule as a special kind of fault. Furthermore, we consider the time for communication between dependent tasks when placing duplicates. We evaluate both scenarios with a simulator that injects faults and slowdowns to processors, and workloads from a benchmark suite of task graph with a variety of structures. Our results indicate that the overhead in the fault-free case is negligible, that a processor failure mostly increases the schedule make span only moderately because duplicates can use gapsin the original schedule, and that the effects of a processors lowdown can partly be mitigated by aborting a (slow) task and running its duplicate.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130792549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Designing Hybrid Architectures for Massive-Scale Graph Analysis 设计用于大规模图分析的混合架构
David Ediger, David A. Bader
{"title":"Designing Hybrid Architectures for Massive-Scale Graph Analysis","authors":"David Ediger, David A. Bader","doi":"10.1109/IPDPSW.2013.172","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.172","url":null,"abstract":"Turning large volumes of data into actionable knowledge is a top challenge in high performance computing. Our previous work in this area demonstrated algorithmic techniques for massively parallel graph analysis on multithreaded systems. This work led to the development of GraphCT, the first end-to-end graph analytics platform for the Cray XMT and x86-class systems with OpenMP, and STINGER, a high performance, multithreaded, dynamic graph data structure and algorithms. Both of these packages are freely available as open source software. This dissertation research culminates in experimental and analytical techniques to study the marriage of disk-based systems, such as Hadoop, with shared memory-based systems, such as the Cray XMT, for data-intensive applications. David Ediger is a fifth year PhD candidate in Electrical and Computer Engineering.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116979860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Network-on-Chip with Long-Range Wireless Links for High-Throughput Scientific Computation 用于高通量科学计算的具有远程无线链路的片上网络
Turbo Majumder, P. Pande, A. Kalyanaraman
{"title":"Network-on-Chip with Long-Range Wireless Links for High-Throughput Scientific Computation","authors":"Turbo Majumder, P. Pande, A. Kalyanaraman","doi":"10.1109/IPDPSW.2013.72","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.72","url":null,"abstract":"Several emerging application domains in scientific computing demand high computation throughputs to achieve terascale or higher performance. Dedicated centers hosting scientific computing tools on a few high-end servers could rely on hardware accelerator co-processors that contain multiple lightweight custom cores interconnected through an on-chip network. While network-on-chip (NoC) driven platforms have been studied in the context of accelerating individual applications, this work studies the efficacy of NoC-based platforms to enhance overall computation throughput in the presence of several concurrently executing jobs. Use of long-range links has been shown to reduce network diameter and we use this property in conjunction with different resource allocation strategies to deliver high throughput. Our experiments using a computational biology application suite as a demonstration study show that the use of long-range wireless shortcuts coupled with the appropriate resource allocation strategy delivers computation throughput over 1011 operations per second, consuming ~0.5 nJ per operation.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131038329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Evaluation of Energy Characteristics of MPI Communication Primitives with RAPL 用RAPL评价MPI通信原语的能量特性
Akshay Venkatesh, K. Kandalla, D. Panda
{"title":"Evaluation of Energy Characteristics of MPI Communication Primitives with RAPL","authors":"Akshay Venkatesh, K. Kandalla, D. Panda","doi":"10.1109/IPDPSW.2013.243","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.243","url":null,"abstract":"The energy consumed by modern supercomputing systems continues to grow at an alarming rate. The Message Passing Interface (MPI) has been the de facto programming model for parallel applications and MPI libraries have been designed to achieve the best communication performance on modern architectures. However, the performance and energy trade-offs of these designs have not been studied. Hence, it is critical to understand the energy consumption characteristics of MPI routines and the performance-energy trade-offs of various protocols and designs that are used in MPI libraries. The first hurdle in achieving this objective is to design a framework that can be used to measure energy consumption of various components during communication operations. The RAPL interface allows users to measure energy across various domains on the Intel Sandy-Bridge processor, in a low-overhead, non-intrusive manner. However, this interface has certain limitations and cannot be directly used to measure energy profiles of MPI operations in a fine-grained manner. In this paper, we propose a novel methodology to address these limitations. We propose a new shared-memory window-based solution to accurately measure the aggregate energy consumed by all processes engaged in MPI operations. Using our proposed framework, we demonstrate the impact of various communication protocols and progress mechanisms on the energy consumption. Our evaluations demonstrate that the kernel-based solutions can potentially lead to lower energy consumption for intra-node communication operations. Further, our framework also reveals possible energy bottlenecks in scaling important collective operations, such as, MPI All reduce. In addition, we also use our proposed framework to study the energy consumption characteristics of MPI calls in the NAS-IS benchmark and we infer that the choice of progress mechanism can lead to about 6% energy savings for the processors.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"2 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131398186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Dataset Scaling and MapReduce Performance 数据集缩放和MapReduce性能
Fan Zhang, M. Sakr
{"title":"Dataset Scaling and MapReduce Performance","authors":"Fan Zhang, M. Sakr","doi":"10.1109/IPDPSW.2013.143","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.143","url":null,"abstract":"Predicting execution behavior of MapReduce applications when scaling the input dataset presents a challenging problem. The difficulty lies in the distributed locations of input data and the distributed, virtualized compute resources that utilize different network substrates. The potential payoff lies in using small datasets and limited test runs to understand how applications will behave with \"big data.\" Our research has developed an in-depth understanding of MapReduce application performance and analyzed the impact of scaling input datasets. While we might expect that \"embarrassingly parallel\" MapReduce jobs should scale linearly with input dataset size, our results show that execution time sometimes increases nonlinearly. To verify our predictions, we identify a benchmark set of Map-, Shuffle-, and Reduce-intensive applications. Experimental results show that our execution-time analysis distinguishes four typical application behaviors when scaling input datasets.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128093071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
How to Scale Dynamic Tuning to Large Parallel Applications 如何将动态调优扩展到大型并行应用程序
Andrea Martínez, A. Sikora, Eduardo César, Joan Sorribes
{"title":"How to Scale Dynamic Tuning to Large Parallel Applications","authors":"Andrea Martínez, A. Sikora, Eduardo César, Joan Sorribes","doi":"10.1109/IPDPSW.2013.31","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.31","url":null,"abstract":"Current performance analysis and tuning tools must be able to improve the performance of large-scale parallel applications. To be effective, such analysis and tuning tools must be scalable and be able to manage the dynamic behaviour of parallel applications. This work presents a scalable solution for dynamic tuning. This approach is based on a hierarchical performance analysis architecture that uses a novel information abstraction mechanism to solve local and global performance problems. We have developed a prototype implementation of the proposed analysis architecture making use of the MRNet framework. Scalability experiments have been performed using this prototype with up to 6400 application tasks. The results obtained show that the proposed analysis architecture will provide the scalability required to carry out dynamic tuning of large-scale parallel applications.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134542389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信