Proceedings Sixth Heterogeneous Computing Workshop (HCW'97)最新文献_第2页

Supporting fault-tolerance in heterogeneous distributed applications 支持异构分布式应用程序中的容错

Proceedings Sixth Heterogeneous Computing Workshop (HCW'97) Pub Date : 1997-04-01 DOI: 10.1109/HCW.1997.581421

P. Maheshwari, J. Ouyang

{"title":"Supporting fault-tolerance in heterogeneous distributed applications","authors":"P. Maheshwari, J. Ouyang","doi":"10.1109/HCW.1997.581421","DOIUrl":"https://doi.org/10.1109/HCW.1997.581421","url":null,"abstract":"Heterogeneous computing opens up new challenges and opportunities in fields such as parallel and distributed processing, design of algorithms for applications, scheduling of parallel tasks, interconnection network technology and support for reliable distributed heterogeneous computing. A trend of supporting fault-tolerance in distributed computing systems is to incorporate fault-tolerance into applications at low cost, in terms of both run time performance and programming effort required to construct reliable application software. We present an approach for developing efficient reliable distributed applications for heterogeneous computing systems. We propose a library prototype, called H-Libra, to support fault-tolerance in heterogeneous systems with low run-time cost. Fault-tolerance is based on distributed consistent checkpointing and rollback-recovery integrated with a user-level network communication protocol. By employing novel mechanisms, minimum communication overhead is involved for taking a consistent distributed checkpoint and catching messages in transit during a checkpoint. By providing fault-tolerance transparency and a simple, easy to use high-level message-passing interface, H-Libra simplifies the development of reliable heterogeneous distributed applications.","PeriodicalId":286909,"journal":{"name":"Proceedings Sixth Heterogeneous Computing Workshop (HCW'97)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116828782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Stochastic Petri nets applied to the performance evaluation of static task allocations in heterogeneous computing environments 随机Petri网应用于异构计算环境下静态任务分配的性能评价

Proceedings Sixth Heterogeneous Computing Workshop (HCW'97) Pub Date : 1997-04-01 DOI: 10.1109/HCW.1997.581420

A. McSpadden, N. Lopez-Benitez

引用次数: 14

On-line use of off-line derived mappings for iterative automatic target recognition tasks and a particular class of hardware platforms 在线使用离线派生映射迭代自动目标识别任务和特定类别的硬件平台

Proceedings Sixth Heterogeneous Computing Workshop (HCW'97) Pub Date : 1997-04-01 DOI: 10.1109/HCW.1997.581413

J. Budenske, R. Ramanujan, H. Siegel

引用次数: 10

Exploiting multiple heterogeneous networks to reduce communication costs in parallel programs 利用多个异构网络来降低并行程序中的通信成本

Proceedings Sixth Heterogeneous Computing Workshop (HCW'97) Pub Date : 1997-04-01 DOI: 10.1109/HCW.1997.581412

JunSeong Kim, D. Lilja

{"title":"Exploiting multiple heterogeneous networks to reduce communication costs in parallel programs","authors":"JunSeong Kim, D. Lilja","doi":"10.1109/HCW.1997.581412","DOIUrl":"https://doi.org/10.1109/HCW.1997.581412","url":null,"abstract":"The different types of messages used by a parallel application program executing in a distributed system can each have unique characteristics so that no single communication network can produce the lowest latency for all messages. For instance, short control messages may be sent with the lowest overhead on one type of network, such as Ethernet, while bulk data transfers may be better suited to a different type of network, such as Fibre Channel or HiPPI. In this paper, we investigate how to exploit multiple heterogeneous communication networks that interconnect the same set of processing nodes by dynamically selecting the best (lowest latency) network for each message based on the message size. We also show how to aggregate these multiple parallel networks into a single virtual network to further reduce the latency and increase the available bandwidth. We test this multiplexing and aggregation on a cluster of SGI multiprocessors interconnected with both Fibre Channel and Ethernet. We find that multiplexing between Ethernet and Fibre Channel can substantially reduce communication overhead in a synthetic benchmark compared to using either network alone. Aggregating these two networks into a single virtual network can further reduce communication delays for applications with many large messages. The best choice of either multiplexing or aggregation depends on the mix of message sizes in application program and the relative overheads of the two networks.","PeriodicalId":286909,"journal":{"name":"Proceedings Sixth Heterogeneous Computing Workshop (HCW'97)","volume":"70 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129639766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Dynamic load balancing of distributed SPMD computations with explicit message-passing 带有显式消息传递的分布式SPMD计算的动态负载平衡

Proceedings Sixth Heterogeneous Computing Workshop (HCW'97) Pub Date : 1997-04-01 DOI: 10.1109/HCW.1997.581406

M. Cermele, M. Colajanni, G. Necci

{"title":"Dynamic load balancing of distributed SPMD computations with explicit message-passing","authors":"M. Cermele, M. Colajanni, G. Necci","doi":"10.1109/HCW.1997.581406","DOIUrl":"https://doi.org/10.1109/HCW.1997.581406","url":null,"abstract":"Distributed systems have the potentiality of becoming an alternative platform for parallel computations. However, there are still many obstacles to overcome, one of the most serious is that distributed systems typically consist of shared heterogeneous components with highly variable computational power. We present a load balancing support that checks the load status and, if necessary, adapts the workload to dynamic platform conditions through data migrations from overloaded to underloaded nodes. Unlike task migration supports for task parallelism and other data migration frameworks for master/slave-based parallel applications, our support works for the entire class of SPMD regular applications with explicit communications such as linear algebra problems, partial differential equation solvers, image processing algorithms. Although we considered several variants (three activation mechanisms, three load monitoring techniques and four decision policies), we implemented only the protocols that guarantee program consistency. The efficiency of the strategies is tested in the instance of two SPMD algorithms that are based on the PVM library enriched by special-purpose primitives for data management. As additional contribution, our research keeps the entire support for dynamic load balancing transparent to the programmer. The only visible interface of our support is the activation phase.","PeriodicalId":286909,"journal":{"name":"Proceedings Sixth Heterogeneous Computing Workshop (HCW'97)","volume":"1212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116484299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Practical issues in heterogeneous processing systems for military applications 军事应用异构处理系统的实际问题

Proceedings Sixth Heterogeneous Computing Workshop (HCW'97) Pub Date : 1997-04-01 DOI: 10.1109/HCW.1997.581418

G. Ladd

引用次数: 2

A performance and portability study of parallel applications using a distributed computing testbed 使用分布式计算测试平台的并行应用程序性能和可移植性研究

Proceedings Sixth Heterogeneous Computing Workshop (HCW'97) Pub Date : 1997-04-01 DOI: 10.1109/HCW.1997.581423

V. Morariu, Mathew Cunningham, Mark Letterman

{"title":"A performance and portability study of parallel applications using a distributed computing testbed","authors":"V. Morariu, Mathew Cunningham, Mark Letterman","doi":"10.1109/HCW.1997.581423","DOIUrl":"https://doi.org/10.1109/HCW.1997.581423","url":null,"abstract":"A case study was conducted to examine the performance and portability of parallel applications, with an emphasis on data transfer among the processors in heterogeneous environments. Several parallel test programs using MPICH, a message passing interface (MPI) library, and the Linda parallel environment were developed to analyze communication performance and portability. These programs implement loosely and tightly synchronized communication models in which each processor exchanges data with two other processors. This data-exchange pattern mimics communication in certain parallel applications using striped partitioning of the computational domain. Tests were performed on an isolated, distributed computing testbed, a live development network and a symmetrical multiprocessing computer system. All network configurations used asynchronous transfer mode (ATM) network technologies. The testbed used in the study was a heterogeneous network consisting of various workstations and networking equipment. This paper presents an analysis of the results and recommendations for designing and implementing course-grained, parallel, scientific applications.","PeriodicalId":286909,"journal":{"name":"Proceedings Sixth Heterogeneous Computing Workshop (HCW'97)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115361460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Optimal task assignment in heterogeneous computing systems 异构计算系统中的最优任务分配

Proceedings Sixth Heterogeneous Computing Workshop (HCW'97) Pub Date : 1997-04-01 DOI: 10.1109/HCW.1997.581416

Muhammad Kafil, I. Ahmad

{"title":"Optimal task assignment in heterogeneous computing systems","authors":"Muhammad Kafil, I. Ahmad","doi":"10.1109/HCW.1997.581416","DOIUrl":"https://doi.org/10.1109/HCW.1997.581416","url":null,"abstract":"Distributed systems comprising networked heterogeneous workstations are now considered to be a viable choice for high-performance computing. For achieving a fast response time from such systems, an efficient assignment of the application tasks to the processors is imperative. The general assignment problem is known to be NP-hard, except in a few special cases with strict assumptions. While a large number of heuristic techniques have been suggested in the literature that can yield sub-optimal solutions in a reasonable amount of time, we aim to develop techniques for optimal solutions under relaxed assumptions. The basis of our research is a best-first search technique known as the A* algorithm from the area of artificial intelligence. The original search technique guarantees an optimal solution but is not feasible for problems of practically large sizes due to its high time and space complexity. We propose a number of algorithms based around the A* technique. The proposed algorithms also yield optimal solutions but are considerably faster. The first algorithm solves the assignment problem by using parallel processing. Parallelizing the assignment algorithm is a natural way to lower the time complexity, and we believe our algorithm to be novel in this regard. The second algorithm is based on a clustering based pre-processing technique that merges the high affinity tasks. Clustering reduces the problem size, which in turn reduces the state-space for the assignment algorithm. We also propose three heuristics which do not guarantee optimal solutions but provide near-optimal solutions and are considerably faster. By using our parallel formulation, the proposed clustering technique and the heuristics can also be parallelized to further improve their time complexity.","PeriodicalId":286909,"journal":{"name":"Proceedings Sixth Heterogeneous Computing Workshop (HCW'97)","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131995755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43