Proceedings International Parallel and Distributed Processing Symposium最新文献

筛选
英文 中文
An executable analytical performance evaluation approach for early performance prediction 一种用于早期性能预测的可执行的分析性性能评估方法
Proceedings International Parallel and Distributed Processing Symposium Pub Date : 2003-04-22 DOI: 10.1109/IPDPS.2003.1213484
Adeline Jacquet, Vincent Janot, C. Leung, G. Gao, R. Govindarajan, T. Sterling
{"title":"An executable analytical performance evaluation approach for early performance prediction","authors":"Adeline Jacquet, Vincent Janot, C. Leung, G. Gao, R. Govindarajan, T. Sterling","doi":"10.1109/IPDPS.2003.1213484","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213484","url":null,"abstract":"Percolation has recently been proposed as a key component of an advanced program execution model for future generation high-end machines featuring adaptive data/code transformation and movement for effective latency tolerance. An early evaluation of the performance effect of percolation is very important in the design space exploration of future generations of supercomputers. In this paper, we develop an executable analytical performance model of a high performance multithreaded architecture that supports percolation. A novel feature of our approach is modeling interactions between software (program) and hardware (architecture) components. We solve the analytical model using a queuing simulation tool enriched with synchronization. The proposed approach is effective and facilitates obtaining performance trends quickly. Our results indicate that percolation brings in significant performance gains (by a factor of 2.7 to 11). Further, our results reveal that percolation and multithreading can complement each other.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"350 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114822935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
An accurate and efficient parallel genetic algorithm to schedule tasks on a cluster 一种精确、高效的并行遗传算法来调度集群上的任务
Proceedings International Parallel and Distributed Processing Symposium Pub Date : 2003-04-22 DOI: 10.1109/IPDPS.2003.1213276
Michelle D. Moore
{"title":"An accurate and efficient parallel genetic algorithm to schedule tasks on a cluster","authors":"Michelle D. Moore","doi":"10.1109/IPDPS.2003.1213276","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213276","url":null,"abstract":"Recent breakthroughs in the mathematical estimation of parallel genetic algorithm parameters by Cantu-Paz (2000) are applied to the NP-complete problem of scheduling multiple tasks on a cluster of computers connected by a shared bus. Experiments reveal that the parallel scheduling algorithm develops very accurate schedules when the parameter guidelines are used.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117103996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Use of the parallel port to measure MPI intertask communication costs in COTS PC clusters 使用并行端口来测量COTS PC集群中的MPI任务间通信成本
Proceedings International Parallel and Distributed Processing Symposium Pub Date : 2003-04-22 DOI: 10.1109/IPDPS.2003.1213497
Maya Haridasan, G. H. Pfitscher
{"title":"Use of the parallel port to measure MPI intertask communication costs in COTS PC clusters","authors":"Maya Haridasan, G. H. Pfitscher","doi":"10.1109/IPDPS.2003.1213497","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213497","url":null,"abstract":"Performance analysis of system time parameters is important for the development of parallel and distributed programs because it provides a means of estimating program execution times and it is important for scheduling tasks on processors. Measuring time intervals between events occurring in different nodes of COTS clusters of workstations is not a trivial task due to the absence of a unified clock view. We propose a different approach to measure system time parameters and program performance in clusters with the aid of the parallel port present in every machine of a COTS cluster. Some experimental values of communication delays using the MPI library in a Linux PC cluster are presented and the efficiency and precision of the proposed mechanism are analyzed.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117108509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
System-level modeling of dynamically reconfigurable hardware with SystemC 动态可重构硬件的系统级建模
Proceedings International Parallel and Distributed Processing Symposium Pub Date : 2003-04-22 DOI: 10.1109/IPDPS.2003.1213321
A. Pelkonen, K. Masselos, M. Cupák
{"title":"System-level modeling of dynamically reconfigurable hardware with SystemC","authors":"A. Pelkonen, K. Masselos, M. Cupák","doi":"10.1109/IPDPS.2003.1213321","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213321","url":null,"abstract":"To cope with the increasing demand for higher computational power and flexibility, dynamically reconfigurable blocks have become an important part inside a system-on-chip. Several methods have been proposed to incorporate their reconfiguration aspects into a design flow. They all lack either an interface to commercially available and industrially used tools or are restricted to a single vendor or technology environment. Therefore a methodology for modeling of dynamically reconfigurable blocks at the system-level using SystemC 2.0 is presented. The high-level model is based on a multi-context representation of the different functionalities that will be mapped on the reconfigurable block during different run-time periods. By specifying the estimated times of context-switching and active-running in the selected functionality modes, the methodology allows us to do true design space exploration at the system-level, without the need to map the design first to an actual technology implementation.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124860814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 62
Reconfigurable mapping functions for online architectures 在线架构的可重构映射功能
Proceedings International Parallel and Distributed Processing Symposium Pub Date : 2003-04-22 DOI: 10.1109/IPDPS.2003.1213318
Shyamnath Harinath, R. Sass
{"title":"Reconfigurable mapping functions for online architectures","authors":"Shyamnath Harinath, R. Sass","doi":"10.1109/IPDPS.2003.1213318","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213318","url":null,"abstract":"Content addressable memory is an expensive component in fixed architecture systems however it may prove to be a valuable tool in online architectures (that is, run-time reconfigurable systems with an online decision algorithm to determine the next reconfiguration). In this paper we define a related problem called an arbitrary mapping function and describe an online architecture. We look at four implementations of an arbitrary mapping function component and compare them in terms of space (number of CLB used), reconfiguration time, and component latency. All of the implementations offer low latency; which is the primary reason to use a content addressable memory or an arbitrary mapping function. Three of the implementations trade large size for very fast reconfiguration while the last implementation is extremely conservative in space but has a large reconfiguration time.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124945066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trust modeling for peer-to-peer based computing systems 基于点对点计算系统的信任建模
Proceedings International Parallel and Distributed Processing Symposium Pub Date : 2003-04-22 DOI: 10.1109/IPDPS.2003.1213203
Farag Azzedin, Muthucumaru Maheswaran
{"title":"Trust modeling for peer-to-peer based computing systems","authors":"Farag Azzedin, Muthucumaru Maheswaran","doi":"10.1109/IPDPS.2003.1213203","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213203","url":null,"abstract":"The peer-to-peer approach to design large-scale systems has significant benefits including scalability, low cost of ownership, robustness, and ability to provide site autonomy. However, this approach has several drawbacks as well including trust issues and lack of coordination and control among the peers. We present a trust model for a peer-to-peer structured large-scale network computing system and completely define the trust model and describe the schemes used in it. Central to the model is the idea of maintaining a recommender network that can be used to obtain references about a target domain. Simulation results indicate that the trust model is capable of building and maintaining trust and also identifying the bad domains.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123629543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Efficient collective operations using remote memory operations on VIA-based clusters 在基于via的集群上使用远程内存操作的高效集体操作
Proceedings International Parallel and Distributed Processing Symposium Pub Date : 2003-04-22 DOI: 10.1109/IPDPS.2003.1213135
Rinku Gupta, P. Balaji, D. Panda, J. Nieplocha
{"title":"Efficient collective operations using remote memory operations on VIA-based clusters","authors":"Rinku Gupta, P. Balaji, D. Panda, J. Nieplocha","doi":"10.1109/IPDPS.2003.1213135","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213135","url":null,"abstract":"High performance scientific applications require efficient and fast collective communication operations. Most collective communication operations have been built on top of point-to-point send/receive primitives. Modern user-level protocols such as VIA and the emerging InfiniBand architecture support remote DMA operations. These operations not only allow data to be moved between the nodes with low overhead but also allow the user to create and provide a logical shared memory address space across the nodes. This feature demonstrates potential for designing high performance and scalable collective operations. In this paper, we discuss the various design issues that may be the basis of a RDMA supported collective communication library. As a proof of concept, we have designed and implemented the RDMA-based broadcast and the RDMA-based allreduce operations. For RDMA-based broadcast, we get a benefit of 14%, when compared to send/receive-based broadcast for 4KB data size on a 16 node cluster. We also introduce a new reduce algorithm called as the Degree-k tree-based reduce algorithm. Combining the RDMA mechanism with the new reduce algorithm shows a benefit of 38% for 4 byte messages and 9% for 4KB messages on a 16 node cluster for the allreduce operation. We also introduce analytical models for broadcast and allreduce to predict the performance of this design for large-scale clusters. These analytical models yield a performance benefit of about 35-40% for 4 bytes and around 14% for 4KB messages for 512 and 1024 node clusters for the allreduce operation.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121724233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Efficient on-the-fly data race detection in multithreaded C++ programs 多线程c++程序中高效的动态数据竞争检测
Proceedings International Parallel and Distributed Processing Symposium Pub Date : 2003-04-22 DOI: 10.1145/781498.781529
Eli Poznianski, A. Schuster
{"title":"Efficient on-the-fly data race detection in multithreaded C++ programs","authors":"Eli Poznianski, A. Schuster","doi":"10.1145/781498.781529","DOIUrl":"https://doi.org/10.1145/781498.781529","url":null,"abstract":"Data race detection is highly essential for debugging multithreaded programs and assuring their correctness. Nevertheless, there is no single universal technique capable of handling the task efficiently, since the data race detection problem is computationally hard in the general case. Thus, all currently available tools, when applied to some general case program, usually result in excessive false alarms or in a large number of undetected races. Another major drawback of currently available tools is that they are restricted, for performance reasons, to detection units of fixed size. Thus, they all suffer from the same problem - choosing a small unit might result in missing some of the data races, while choosing a large one might lead to false detection. We present a novel testing tool, called MultiRace, which combines improved versions of Djit and Lockset - two very powerful on-the-fly algorithms for dynamic detection of apparent data races. Both extended algorithms detect races in multithreaded programs that may execute on weak consistency systems, and may use two-way as well as global synchronization primitives. By employing novel technologies, MultiRace adjusts its detection to the native granularity of objects and variables in the program under examination. In order to monitor all accesses to each of the shared locations, MultiRace instruments the C++ source code of the program. It lets the user fine-tune the detection process, but otherwise is completely automatic and transparent. This paper describes the algorithms employed in MultiRace, gives highlights of its implementation issues, and suggests some optimizations. It shows that the overheads imposed by MultiRace are often much smaller (orders of magnitude) than those obtained by other existing tools.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121763749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 119
A performance interface for component-based applications 基于组件的应用程序的性能接口
Proceedings International Parallel and Distributed Processing Symposium Pub Date : 2003-04-22 DOI: 10.1109/IPDPS.2003.1213500
S. Shende, A. Malony, C. Rasmussen, M. Sottile
{"title":"A performance interface for component-based applications","authors":"S. Shende, A. Malony, C. Rasmussen, M. Sottile","doi":"10.1109/IPDPS.2003.1213500","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213500","url":null,"abstract":"This work targets the emerging use of software component technology for high-performance scientific parallel and distributed computing. While component software engineering will benefit the construction of complex science applications, its use presents several challenges to performance optimization. A component application is composed of a set of components, thus, application performance depends on the interaction (possibly non-linear) of the component set. Furthermore, a component is a \"binary unit of composition\" and the only information users have is the interface the component provides to the outside world. An interface for component performance measurement and query is presented to address optimization issues. We describe the performance component design and an example demonstrating its use for runtime performance tuning.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121425640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Choosing among alternative pasts 在不同的过去中选择
Proceedings International Parallel and Distributed Processing Symposium Pub Date : 2003-04-22 DOI: 10.1109/IPDPS.2003.1213516
M. Biberstein, E. Farchi, S. Ur
{"title":"Choosing among alternative pasts","authors":"M. Biberstein, E. Farchi, S. Ur","doi":"10.1109/IPDPS.2003.1213516","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213516","url":null,"abstract":"The main problem with testing concurrent programs is their non-determinism: two executions of such a program may yield different results. The traditional solution is to identify and examine the race conditions. A different approach is that of generating different interleavings at runtime using embedded sleep statements. Advantages of this approach over the traditional one include its ability to identify more problems, and the absence of the false alarms. This paper proposes a totally different technique for the generation of interleavings. Operations on shared variables are tracked. Every time a shared variable is read, the read value is chosen among the values that the variable could hold in some interleaving consistent with the past observed events. The event timing restrictions are then updated based on the value chosen. The problem of identifying legal read values is far from simple due to the fact that past value substitutions affect future ones. Our solution is computationally intensive and, therefore, impractical as is. However, insights gained from it lead to practical heuristics for operating the embedded sleep statements.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122653449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信