2008 IEEE International Symposium on Parallel and Distributed Processing最新文献

筛选
英文 中文
Large-scale experiment of co-allocation strategies for Peer-to-Peer supercomputing in P2P-MPI P2P-MPI中点对点超级计算协同分配策略的大规模实验
2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536212
S. Genaud, Choopan Rattanapoka
{"title":"Large-scale experiment of co-allocation strategies for Peer-to-Peer supercomputing in P2P-MPI","authors":"S. Genaud, Choopan Rattanapoka","doi":"10.1109/IPDPS.2008.4536212","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536212","url":null,"abstract":"High Performance computing generally involves some parallel applications to be deployed on the multiples resources used for the computation. The problem of scheduling the application across distributed resources is termed as co-allocation. In a grid context, co-allocation is difficult since the grid middleware must face a dynamic environment. Middleware architecture on a peer-to-peer (P2P) basis have been proposed to tackle most limitations of centralized systems. Some of the issues addressed by P2P systems are fault tolerance, ease of maintenance, and scalability in resource discovery. However, the lack of global knowledge makes scheduling difficult in P2P systems. In this paper, we present the new developments concerning locality awareness as well as co-allocation strategies available in the latest release of P2P-MPI. i) The spread strategy tries to map processes on hosts so as to maximize the total amount of available memory while maintaining locality of processes as a secondary objective, ii) The concentrate strategy tries to maximize locality between processes by using as many cores as hosts offer. The co-allocation scheme has been devised to be simple for the user and meets the main high performance computing requirement which is locality. Extensive experiments have been conducted on Grid5000 with up to 600 processes on 6 sites throughout France. Results show that we achieved the targeted goals in these real conditions.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129025614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Enhancing application robustness through adaptive fault tolerance 通过自适应容错增强应用程序健壮性
2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536383
Z. Lan, Yawei Li, Ziming Zheng, Prashasta Gujrati
{"title":"Enhancing application robustness through adaptive fault tolerance","authors":"Z. Lan, Yawei Li, Ziming Zheng, Prashasta Gujrati","doi":"10.1109/IPDPS.2008.4536383","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536383","url":null,"abstract":"As the scale of high performance computing (HPC) continues to grow, application fault resilience becomes crucial. To address this problem, we are working on the design of an adaptive fault tolerance system for HPC applications. It aims to enable parallel applications to avoid anticipated failures via preventive migration, and in the case of unforeseeable failures, to minimize their impact through selective checkpointing. Both prior and ongoing work are summarized in this paper.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122362958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Modeling and predicting application performance on parallel computers using HPC challenge benchmarks 在使用高性能计算挑战基准的并行计算机上建模和预测应用程序性能
2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536278
W. Pfeiffer, N. Wright
{"title":"Modeling and predicting application performance on parallel computers using HPC challenge benchmarks","authors":"W. Pfeiffer, N. Wright","doi":"10.1109/IPDPS.2008.4536278","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536278","url":null,"abstract":"A method is presented for modeling application performance on parallel computers in terms of the performance of microkernels from the HPC Challenge benchmarks. Specifically, the application run time is expressed as a linear combination of inverse speeds and latencies from microkernels or system characteristics. The model parameters are obtained by an automated series of least squares fits using backward elimination to ensure statistical significance. If necessary, outliers are deleted to ensure that the final fit is robust. Typically three or four terms appear in each model: at most one each for floating-point speed, memory bandwidth, interconnect bandwidth, and interconnect latency. Such models allow prediction of application performance on future computers from easier-to-make predictions of microkernel performance. The method was used to build models for four benchmark problems involving the PARATEC and MILC scientific applications. These models not only describe performance well on the ten computers used to build the models, but also do a good job of predicting performance on three additional computers with newer design features. For the four application benchmark problems with six predictions each, the relative root mean squared error in the predicted run times varies between 13 and 16%. The method was also used to build models for the HPL and G-FFTE benchmarks in HPCC, including functional dependences on problem size and core count from complexity analysis. The model for HPL predicts performance even better than the application models do, while the model for G-FFTE systematically underpredicts run times.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128894491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Experiences in scaling scientific applications on current-generation quad-core processors 在当前一代四核处理器上扩展科学应用的经验
2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536342
K. Barker, K. Davis, A. Hoisie, D. Kerbyson, M. Lang, S. Pakin, J. Sancho
{"title":"Experiences in scaling scientific applications on current-generation quad-core processors","authors":"K. Barker, K. Davis, A. Hoisie, D. Kerbyson, M. Lang, S. Pakin, J. Sancho","doi":"10.1109/IPDPS.2008.4536342","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536342","url":null,"abstract":"In this work we present an initial performance evaluation of AMD and Intel's first quad-core processor offerings: the AMD Barcelona and the Intel Xeon X7350. We examine the suitability of these processors in quad-socket compute nodes as building blocks for large-scale scientific computing clusters. Our analysis of intra-processor and intra-node scalability of microbenchmarks and a range of large- scale scientific applications indicates that quad-core processors can deliver an improvement in performance of up to 4x per processor but is heavily dependent on the workload being processed. While the Intel processor has a higher clock rate and peak performance, the AMD processor has higher memory bandwidth and intra-node scalability. The scientific applications we analyzed exhibit a range of performance improvements from only 3x up to the full 16x speed-up over a single core. Also, we note that the maximum node performance is not necessarily achieved by using all 16 cores.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130211849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Self-stabilizing algorithms for sorting and heapification 排序和堆化的自稳定算法
2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536327
Doina Bein, A. Datta, L. Larmore
{"title":"Self-stabilizing algorithms for sorting and heapification","authors":"Doina Bein, A. Datta, L. Larmore","doi":"10.1109/IPDPS.2008.4536327","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536327","url":null,"abstract":"We present two space and time efficient asynchronous distributed self-stabilizing algorithms. The first sorts an oriented chain network and the second heapifies a rooted tree network. The time complexity of both solutions is linear - in terms of the nodes (for the chain) and height (for the tree). The chain sorting algorithm uses O(m) bits per process where m represents the number of bits required to store any value in the network. The heapify algorithm needs O(m ldr D) bits per process where D is the degree of the tree.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130576041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Facilitating efficient synchronization of asymmetric threads on hyper-threaded processors 促进超线程处理器上非对称线程的高效同步
2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536358
Nikos Anastopoulos, N. Koziris
{"title":"Facilitating efficient synchronization of asymmetric threads on hyper-threaded processors","authors":"Nikos Anastopoulos, N. Koziris","doi":"10.1109/IPDPS.2008.4536358","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536358","url":null,"abstract":"So far, the privileged instructions MONITOR and MWAIT introduced with Intel Prescott core, have been used mostly for inter-thread synchronization in operating systems code. In a hyper-threaded processor, these instructions offer a \"performance-optimized\" way for threads involved in synchronization events to wait on a condition. In this work, we explore the potential of using these instructions for synchronizing application threads that execute on hyper-threaded processors, and are characterized by workload asymmetry. Initially, we propose a framework through which one can use MON- ITOR/MWAIT to build condition wait and notification primitives, with minimal kernel involvement. Then, we evaluate the efficiency of these primitives in a bottom-up manner: at first, we quantify certain performance aspects of the primitives that reflect the execution model under consideration, such as resource consumption and responsiveness, and we compare them against other commonly used implementations. As a further step, we use our primitives to build synchronization barriers. Again, we examine the same performance issues as before, and using a pseudo-benchmark we evaluate the efficiency of our implementation for fine-grained inter-thread synchronization. In terms of throughput, our barriers yielded 12% better performance on average compared to Pthreads, and 26% compared to a spin-loops-based implementation, for varying levels of threads asymmetry. Finally, we test our barriers in a real- world scenario, and specifically, in applying thread-level Speculative Pre computation on four applications. For this multithreaded execution scheme, our implementation provided up to 7% better performance compared to Pthreads, and up to 40% compared to spin-loops-based barriers.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"283 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132093268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Parallel mining of closed quasi-cliques 封闭准团的并行开采
2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536250
Yuzhou Zhang, Jianyong Wang, Zhiping Zeng, Lizhu Zhou
{"title":"Parallel mining of closed quasi-cliques","authors":"Yuzhou Zhang, Jianyong Wang, Zhiping Zeng, Lizhu Zhou","doi":"10.1109/IPDPS.2008.4536250","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536250","url":null,"abstract":"Graph structure can model the relationships among a set of objects. Mining quasi-clique patterns from large dense graph data makes sense with respect to both statistic and applications. The applications of frequent quasi-cliques include stock price correlation discovery, gene function prediction and protein molecular analysis. Although the graph mining community has devised many skills to accelerate the discovery process, mining time is always unacceptable, especially on large dense graph data with low support threshold. Therefore, parallel algorithms are desirable on mining quasi-clique patterns. Message passing is one of the most widely used parallel framework. In this paper, we parallelize the state-of-the-art closed quasi-clique mining algorithm called Cocain using message passing. The parallelized version of Cocain can achieve 30+ fold speedup on 32 processors in a cluster of SMPs. The techniques proposed in this work can be applied to parallelize other pattern-growth based frequent pattern mining algorithms.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130959688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Mobility control schemes with quick convergence in wireless sensor networks 无线传感器网络中快速收敛的移动控制方案
2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536119
Xiao Chen, Zhen Jiang, Jie Wu
{"title":"Mobility control schemes with quick convergence in wireless sensor networks","authors":"Xiao Chen, Zhen Jiang, Jie Wu","doi":"10.1109/IPDPS.2008.4536119","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536119","url":null,"abstract":"In the near future, wireless sensor networks (WSN) performing sensing and communication tasks will be widely deployed as technology rapidly advances. Communication is one of the essential functionalities of these networks while power and computation resources in each sensor are limited. Recently, attention has been drawn to using mobility control to minimize energy consumption in wireless sensor networks. In this paper, we are going to provide quickly converging mobility control schemes to achieve optimal configuration in a single data flow. The key idea of our schemes is to use the optimal location information of each relay node as a guide for node movement while maintaining the connectivity of relay nodes along the dataflow. Experimental results show that our schemes can speed up the convergence process to nearly the optimal and reduce the cost of it almost to the minimum, compared with the best results known to the date.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128828178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
LiteLoad: Content unaware routing for localizing P2P protocols LiteLoad:用于本地化P2P协议的内容不感知路由
2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536204
Shay Horovitz, D. Dolev
{"title":"LiteLoad: Content unaware routing for localizing P2P protocols","authors":"Shay Horovitz, D. Dolev","doi":"10.1109/IPDPS.2008.4536204","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536204","url":null,"abstract":"In today's extensive worldwide Internet traffic, some 60% of network congestion is caused by Peer to Peer sessions. Consequently ISPs are facing many challenges like: paying for the added traffic requirement, poor customer satisfaction due to degraded broadband experience, purchasing costly backbone links and upstream bandwidth and having difficulty to effectively control P2P traffic with conventional devices. Existing solutions such as caching and indexing of P2P content are controversial as their legality is uncertain due to copyright violation, and therefore hardly being installed by ISPs. In addition these solutions are not capable to handle existing encrypted protocols that are on the rise in popular P2P networks. Other solutions that employ traffic shaping and blocking degrade the downloading throughput and cause end users to switch ISPs for a better service. LiteLoad discerns patterns of user communications in Peer to Peer file sharing networks without identifying the content being requested or transferred and uses least-cost routing rules to push peer-to-peer transfers into confined network segments. This approach maintains the performance of file transfer as opposed to traffic shaping solutions and precludes internet provider involvement in caching, cataloguing or indexing of the shared content. Simulation results expresses the potential of the solution and a proof of concept of the key technology is demonstrated on popular protocols, including encrypted ones.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125417865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A plug-and-play model for evaluating wavefront computations on parallel architectures 一个用于评估并行架构上波前计算的即插即用模型
2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536243
G. Mudalige, M. Vernon, S. Jarvis
{"title":"A plug-and-play model for evaluating wavefront computations on parallel architectures","authors":"G. Mudalige, M. Vernon, S. Jarvis","doi":"10.1109/IPDPS.2008.4536243","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536243","url":null,"abstract":"This paper develops a plug-and-play reusable LogGP model that can be used to predict the runtime and scaling behavior of different MPI-based pipelined wavefront applications running on modern parallel platforms with multi- core nodes. A key new feature of the model is that it requires only a few simple input parameters to project performance for wavefront codes with different structure to the sweeps in each iteration as well as different behavior during each wavefront computation and/or between iterations. We apply the model to three key benchmark applications that are used in high performance computing procurement, illustrating that the model parameters yield insight into the key differences among the codes. We also develop new, simple and highly accurate models of MPI send, receive, and group communication primitives on the dual-core Cray XT system. We validate the reusable model applied to each benchmark on up to 8192 processors on the XT3/XT4. Results show excellent accuracy for all high performance application and platform configurations that we were able to measure. Finally we use the model to assess application and hardware configurations, develop new metrics for procurement and configuration, identify bottlenecks, and assess new application design modifications that, to our knowledge, have not previously been explored.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126672250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信