2008 IEEE International Symposium on Parallel and Distributed Processing最新文献_第8页

Large-scale experiment of co-allocation strategies for Peer-to-Peer supercomputing in P2P-MPI P2P-MPI中点对点超级计算协同分配策略的大规模实验

2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536212

S. Genaud, Choopan Rattanapoka

{"title":"Large-scale experiment of co-allocation strategies for Peer-to-Peer supercomputing in P2P-MPI","authors":"S. Genaud, Choopan Rattanapoka","doi":"10.1109/IPDPS.2008.4536212","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536212","url":null,"abstract":"High Performance computing generally involves some parallel applications to be deployed on the multiples resources used for the computation. The problem of scheduling the application across distributed resources is termed as co-allocation. In a grid context, co-allocation is difficult since the grid middleware must face a dynamic environment. Middleware architecture on a peer-to-peer (P2P) basis have been proposed to tackle most limitations of centralized systems. Some of the issues addressed by P2P systems are fault tolerance, ease of maintenance, and scalability in resource discovery. However, the lack of global knowledge makes scheduling difficult in P2P systems. In this paper, we present the new developments concerning locality awareness as well as co-allocation strategies available in the latest release of P2P-MPI. i) The spread strategy tries to map processes on hosts so as to maximize the total amount of available memory while maintaining locality of processes as a secondary objective, ii) The concentrate strategy tries to maximize locality between processes by using as many cores as hosts offer. The co-allocation scheme has been devised to be simple for the user and meets the main high performance computing requirement which is locality. Extensive experiments have been conducted on Grid5000 with up to 600 processes on 6 sites throughout France. Results show that we achieved the targeted goals in these real conditions.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129025614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Enhancing application robustness through adaptive fault tolerance 通过自适应容错增强应用程序健壮性

2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536383

Z. Lan, Yawei Li, Ziming Zheng, Prashasta Gujrati

引用次数: 8

Modeling and predicting application performance on parallel computers using HPC challenge benchmarks 在使用高性能计算挑战基准的并行计算机上建模和预测应用程序性能

2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536278

W. Pfeiffer, N. Wright

{"title":"Modeling and predicting application performance on parallel computers using HPC challenge benchmarks","authors":"W. Pfeiffer, N. Wright","doi":"10.1109/IPDPS.2008.4536278","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536278","url":null,"abstract":"A method is presented for modeling application performance on parallel computers in terms of the performance of microkernels from the HPC Challenge benchmarks. Specifically, the application run time is expressed as a linear combination of inverse speeds and latencies from microkernels or system characteristics. The model parameters are obtained by an automated series of least squares fits using backward elimination to ensure statistical significance. If necessary, outliers are deleted to ensure that the final fit is robust. Typically three or four terms appear in each model: at most one each for floating-point speed, memory bandwidth, interconnect bandwidth, and interconnect latency. Such models allow prediction of application performance on future computers from easier-to-make predictions of microkernel performance. The method was used to build models for four benchmark problems involving the PARATEC and MILC scientific applications. These models not only describe performance well on the ten computers used to build the models, but also do a good job of predicting performance on three additional computers with newer design features. For the four application benchmark problems with six predictions each, the relative root mean squared error in the predicted run times varies between 13 and 16%. The method was also used to build models for the HPL and G-FFTE benchmarks in HPCC, including functional dependences on problem size and core count from complexity analysis. The model for HPL predicts performance even better than the application models do, while the model for G-FFTE systematically underpredicts run times.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128894491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

Experiences in scaling scientific applications on current-generation quad-core processors 在当前一代四核处理器上扩展科学应用的经验

2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536342

K. Barker, K. Davis, A. Hoisie, D. Kerbyson, M. Lang, S. Pakin, J. Sancho

引用次数: 14

Self-stabilizing algorithms for sorting and heapification 排序和堆化的自稳定算法

2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536327

Doina Bein, A. Datta, L. Larmore

引用次数: 1

Facilitating efficient synchronization of asymmetric threads on hyper-threaded processors 促进超线程处理器上非对称线程的高效同步

2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536358

Nikos Anastopoulos, N. Koziris

{"title":"Facilitating efficient synchronization of asymmetric threads on hyper-threaded processors","authors":"Nikos Anastopoulos, N. Koziris","doi":"10.1109/IPDPS.2008.4536358","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536358","url":null,"abstract":"So far, the privileged instructions MONITOR and MWAIT introduced with Intel Prescott core, have been used mostly for inter-thread synchronization in operating systems code. In a hyper-threaded processor, these instructions offer a \"performance-optimized\" way for threads involved in synchronization events to wait on a condition. In this work, we explore the potential of using these instructions for synchronizing application threads that execute on hyper-threaded processors, and are characterized by workload asymmetry. Initially, we propose a framework through which one can use MON- ITOR/MWAIT to build condition wait and notification primitives, with minimal kernel involvement. Then, we evaluate the efficiency of these primitives in a bottom-up manner: at first, we quantify certain performance aspects of the primitives that reflect the execution model under consideration, such as resource consumption and responsiveness, and we compare them against other commonly used implementations. As a further step, we use our primitives to build synchronization barriers. Again, we examine the same performance issues as before, and using a pseudo-benchmark we evaluate the efficiency of our implementation for fine-grained inter-thread synchronization. In terms of throughput, our barriers yielded 12% better performance on average compared to Pthreads, and 26% compared to a spin-loops-based implementation, for varying levels of threads asymmetry. Finally, we test our barriers in a real- world scenario, and specifically, in applying thread-level Speculative Pre computation on four applications. For this multithreaded execution scheme, our implementation provided up to 7% better performance compared to Pthreads, and up to 40% compared to spin-loops-based barriers.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"283 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132093268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Parallel mining of closed quasi-cliques 封闭准团的并行开采

2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536250

Yuzhou Zhang, Jianyong Wang, Zhiping Zeng, Lizhu Zhou

引用次数: 5

Mobility control schemes with quick convergence in wireless sensor networks 无线传感器网络中快速收敛的移动控制方案

2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536119

Xiao Chen, Zhen Jiang, Jie Wu

引用次数: 13

LiteLoad: Content unaware routing for localizing P2P protocols LiteLoad:用于本地化P2P协议的内容不感知路由

2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536204

Shay Horovitz, D. Dolev

{"title":"LiteLoad: Content unaware routing for localizing P2P protocols","authors":"Shay Horovitz, D. Dolev","doi":"10.1109/IPDPS.2008.4536204","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536204","url":null,"abstract":"In today's extensive worldwide Internet traffic, some 60% of network congestion is caused by Peer to Peer sessions. Consequently ISPs are facing many challenges like: paying for the added traffic requirement, poor customer satisfaction due to degraded broadband experience, purchasing costly backbone links and upstream bandwidth and having difficulty to effectively control P2P traffic with conventional devices. Existing solutions such as caching and indexing of P2P content are controversial as their legality is uncertain due to copyright violation, and therefore hardly being installed by ISPs. In addition these solutions are not capable to handle existing encrypted protocols that are on the rise in popular P2P networks. Other solutions that employ traffic shaping and blocking degrade the downloading throughput and cause end users to switch ISPs for a better service. LiteLoad discerns patterns of user communications in Peer to Peer file sharing networks without identifying the content being requested or transferred and uses least-cost routing rules to push peer-to-peer transfers into confined network segments. This approach maintains the performance of file transfer as opposed to traffic shaping solutions and precludes internet provider involvement in caching, cataloguing or indexing of the shared content. Simulation results expresses the potential of the solution and a proof of concept of the key technology is demonstrated on popular protocols, including encrypted ones.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125417865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

A plug-and-play model for evaluating wavefront computations on parallel architectures 一个用于评估并行架构上波前计算的即插即用模型

2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536243

G. Mudalige, M. Vernon, S. Jarvis

{"title":"A plug-and-play model for evaluating wavefront computations on parallel architectures","authors":"G. Mudalige, M. Vernon, S. Jarvis","doi":"10.1109/IPDPS.2008.4536243","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536243","url":null,"abstract":"This paper develops a plug-and-play reusable LogGP model that can be used to predict the runtime and scaling behavior of different MPI-based pipelined wavefront applications running on modern parallel platforms with multi- core nodes. A key new feature of the model is that it requires only a few simple input parameters to project performance for wavefront codes with different structure to the sweeps in each iteration as well as different behavior during each wavefront computation and/or between iterations. We apply the model to three key benchmark applications that are used in high performance computing procurement, illustrating that the model parameters yield insight into the key differences among the codes. We also develop new, simple and highly accurate models of MPI send, receive, and group communication primitives on the dual-core Cray XT system. We validate the reusable model applied to each benchmark on up to 8192 processors on the XT3/XT4. Results show excellent accuracy for all high performance application and platform configurations that we were able to measure. Finally we use the model to assess application and hardware configurations, develop new metrics for procurement and configuration, identify bottlenecks, and assess new application design modifications that, to our knowledge, have not previously been explored.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126672250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 47