Proceedings International Parallel and Distributed Processing Symposium最新文献_第10页

An executable analytical performance evaluation approach for early performance prediction 一种用于早期性能预测的可执行的分析性性能评估方法

Proceedings International Parallel and Distributed Processing Symposium Pub Date : 2003-04-22 DOI: 10.1109/IPDPS.2003.1213484

Adeline Jacquet, Vincent Janot, C. Leung, G. Gao, R. Govindarajan, T. Sterling

引用次数: 15

An accurate and efficient parallel genetic algorithm to schedule tasks on a cluster 一种精确、高效的并行遗传算法来调度集群上的任务

Proceedings International Parallel and Distributed Processing Symposium Pub Date : 2003-04-22 DOI: 10.1109/IPDPS.2003.1213276

Michelle D. Moore

引用次数: 26

Use of the parallel port to measure MPI intertask communication costs in COTS PC clusters 使用并行端口来测量COTS PC集群中的MPI任务间通信成本

Proceedings International Parallel and Distributed Processing Symposium Pub Date : 2003-04-22 DOI: 10.1109/IPDPS.2003.1213497

Maya Haridasan, G. H. Pfitscher

引用次数: 2

System-level modeling of dynamically reconfigurable hardware with SystemC 动态可重构硬件的系统级建模

Proceedings International Parallel and Distributed Processing Symposium Pub Date : 2003-04-22 DOI: 10.1109/IPDPS.2003.1213321

A. Pelkonen, K. Masselos, M. Cupák

引用次数: 62

Reconfigurable mapping functions for online architectures 在线架构的可重构映射功能

Proceedings International Parallel and Distributed Processing Symposium Pub Date : 2003-04-22 DOI: 10.1109/IPDPS.2003.1213318

Shyamnath Harinath, R. Sass

引用次数: 0

Trust modeling for peer-to-peer based computing systems 基于点对点计算系统的信任建模

Proceedings International Parallel and Distributed Processing Symposium Pub Date : 2003-04-22 DOI: 10.1109/IPDPS.2003.1213203

Farag Azzedin, Muthucumaru Maheswaran

引用次数: 43

Efficient collective operations using remote memory operations on VIA-based clusters 在基于via的集群上使用远程内存操作的高效集体操作

Proceedings International Parallel and Distributed Processing Symposium Pub Date : 2003-04-22 DOI: 10.1109/IPDPS.2003.1213135

Rinku Gupta, P. Balaji, D. Panda, J. Nieplocha

{"title":"Efficient collective operations using remote memory operations on VIA-based clusters","authors":"Rinku Gupta, P. Balaji, D. Panda, J. Nieplocha","doi":"10.1109/IPDPS.2003.1213135","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213135","url":null,"abstract":"High performance scientific applications require efficient and fast collective communication operations. Most collective communication operations have been built on top of point-to-point send/receive primitives. Modern user-level protocols such as VIA and the emerging InfiniBand architecture support remote DMA operations. These operations not only allow data to be moved between the nodes with low overhead but also allow the user to create and provide a logical shared memory address space across the nodes. This feature demonstrates potential for designing high performance and scalable collective operations. In this paper, we discuss the various design issues that may be the basis of a RDMA supported collective communication library. As a proof of concept, we have designed and implemented the RDMA-based broadcast and the RDMA-based allreduce operations. For RDMA-based broadcast, we get a benefit of 14%, when compared to send/receive-based broadcast for 4KB data size on a 16 node cluster. We also introduce a new reduce algorithm called as the Degree-k tree-based reduce algorithm. Combining the RDMA mechanism with the new reduce algorithm shows a benefit of 38% for 4 byte messages and 9% for 4KB messages on a 16 node cluster for the allreduce operation. We also introduce analytical models for broadcast and allreduce to predict the performance of this design for large-scale clusters. These analytical models yield a performance benefit of about 35-40% for 4 bytes and around 14% for 4KB messages for 512 and 1024 node clusters for the allreduce operation.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121724233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Efficient on-the-fly data race detection in multithreaded C++ programs 多线程c++程序中高效的动态数据竞争检测

Proceedings International Parallel and Distributed Processing Symposium Pub Date : 2003-04-22 DOI: 10.1145/781498.781529

Eli Poznianski, A. Schuster

{"title":"Efficient on-the-fly data race detection in multithreaded C++ programs","authors":"Eli Poznianski, A. Schuster","doi":"10.1145/781498.781529","DOIUrl":"https://doi.org/10.1145/781498.781529","url":null,"abstract":"Data race detection is highly essential for debugging multithreaded programs and assuring their correctness. Nevertheless, there is no single universal technique capable of handling the task efficiently, since the data race detection problem is computationally hard in the general case. Thus, all currently available tools, when applied to some general case program, usually result in excessive false alarms or in a large number of undetected races. Another major drawback of currently available tools is that they are restricted, for performance reasons, to detection units of fixed size. Thus, they all suffer from the same problem - choosing a small unit might result in missing some of the data races, while choosing a large one might lead to false detection. We present a novel testing tool, called MultiRace, which combines improved versions of Djit and Lockset - two very powerful on-the-fly algorithms for dynamic detection of apparent data races. Both extended algorithms detect races in multithreaded programs that may execute on weak consistency systems, and may use two-way as well as global synchronization primitives. By employing novel technologies, MultiRace adjusts its detection to the native granularity of objects and variables in the program under examination. In order to monitor all accesses to each of the shared locations, MultiRace instruments the C++ source code of the program. It lets the user fine-tune the detection process, but otherwise is completely automatic and transparent. This paper describes the algorithms employed in MultiRace, gives highlights of its implementation issues, and suggests some optimizations. It shows that the overheads imposed by MultiRace are often much smaller (orders of magnitude) than those obtained by other existing tools.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121763749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 119

A performance interface for component-based applications 基于组件的应用程序的性能接口

Proceedings International Parallel and Distributed Processing Symposium Pub Date : 2003-04-22 DOI: 10.1109/IPDPS.2003.1213500

S. Shende, A. Malony, C. Rasmussen, M. Sottile

引用次数: 18

Choosing among alternative pasts 在不同的过去中选择

Proceedings International Parallel and Distributed Processing Symposium Pub Date : 2003-04-22 DOI: 10.1109/IPDPS.2003.1213516

M. Biberstein, E. Farchi, S. Ur

引用次数: 10