{"title":"Design of multicast packet switches for high-speed multi-service networks","authors":"K. Ravindran","doi":"10.1109/HPDC.1996.546228","DOIUrl":"https://doi.org/10.1109/HPDC.1996.546228","url":null,"abstract":"The paper describes a multicast switch architecture for multi service networks that supports multi destination packet delivery at high data transfer rates (/spl ap/150 mb/sec for full motion video) and allows large aggregate data carrying capacity (/spl ap/1000 mb/sec). The switch architecture is made extensible by adopting a network oriented design whereby the switch functions are cast with the requirements of a canonical network model for packet multicasting. The requirements are routing and priority based scheduling of packets from the input to output link(s) of each multicast channel segment supported by a switch. Packet routing is efficiently implementable in hardware by maintaining the information about all channel segments supported by the switch in a fast associative store. Our architecture yields high switching efficiency by using high speed link processors, distributed associative store, and parallel execution of routing and scheduling activities. The paper describes various functional elements of the switch architecture, and identifies the performance boundaries of switch realization on high speed processor and communication components.","PeriodicalId":267002,"journal":{"name":"Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1996-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126745106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jean-François Huard, I. Inoue, A. Lazar, H. Yamanaka
{"title":"Meeting QOS guarantees by end-to-end QOS monitoring and adaptation","authors":"Jean-François Huard, I. Inoue, A. Lazar, H. Yamanaka","doi":"10.1109/HPDC.1996.546205","DOIUrl":"https://doi.org/10.1109/HPDC.1996.546205","url":null,"abstract":"The design and implementation of the transport layer of a native ATM (asynchronous tranfer mode) protocol stack and its embedding into an overall architecture that provides end-to-end quality of service (QOS) is presented. Within this architecture, the typical transport functionalities are enlarged with QOS monitoring and adaptation mechanisms. A QOS-based API (application programming interface) is proposed that shields application programmers from the complexity of QOS management and control. Both unicast and multicast connections supporting interactive multimedia applications are considered.","PeriodicalId":267002,"journal":{"name":"Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1996-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121328687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A massively parallel fast multipole algorithm in three dimensions","authors":"E. Lu, D. Okunbor","doi":"10.1109/HPDC.1996.546172","DOIUrl":"https://doi.org/10.1109/HPDC.1996.546172","url":null,"abstract":"The simulation of many-body, many-particle systems has a wide range of applications in areas such as biophysics, chemistry, astrophysics, etc. It is known that the force calculation contributes 90% of the simulation time. This is mainly due to the fact that the total number of interactions in the force is O(N/sup 2/), where N is the number of particles in the system. The fast multipole algorithm proposed by Greengard and Rokhlin (1987) reduces the time complexity to O(N). In this paper, we design an efficient, parallel fast multipole algorithm in 3D. For portability, our parallel program is implemented using the Message Passing Interface. Is it possible to obtain high performance for a computationally-intensive application using a LAN of workstations? In this paper, we attempt to answer this question, which is commonly asked by those researchers who have no access to parallel computers or supercomputers.","PeriodicalId":267002,"journal":{"name":"Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1996-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116720494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Customized dynamic load balancing for a network of workstations","authors":"Mohammed J. Zaki, Wei Li, S. Parthasarathy","doi":"10.1109/HPDC.1996.546198","DOIUrl":"https://doi.org/10.1109/HPDC.1996.546198","url":null,"abstract":"Load balancing involves assigning to each processor work proportional to its performance, minimizing the execution time of the program. Although static load balancing can solve many problems (e.g., those caused by processor heterogeneity and non uniform loops) for most regular applications, the transient external load due to multiple users on a network of workstations necessitates a dynamic approach to load balancing. We examine the behavior of global vs. local, and centralized vs. distributed, load balancing strategies. We show that different schemes are best for different applications under varying program and system parameters. Therefore, customized load balancing schemes become essential for good performance. We present a hybrid compile time and run time modeling and decision process which selects (customizes) the best scheme, along with automatic generation of parallel code with calls to a run time library for load balancing.","PeriodicalId":267002,"journal":{"name":"Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1996-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128007868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A study of caching proxy mechanisms realized on wide area distributed networks","authors":"M. Oguchi, K. Ono","doi":"10.1109/HPDC.1996.546215","DOIUrl":"https://doi.org/10.1109/HPDC.1996.546215","url":null,"abstract":"The information retrieval systems on a wide area distributed network, such as the World-Wide Web (WWW), become popular among the extremely large number of users. The caching proxy has an important role in these systems for improving their accessibility and serviceability. The caching proxy mechanism is discussed in this paper. First, the role and structure of the caching proxy is explained, and two major problems of existing systems are pointed out. Our solution to overcome these problems is proposed next. Changing a file size by controlling a quality level of cached multimedia data is proposed as a measure to overcome one problem. As a solution to the other problem, making a cluster among neighboring caching proxies, using hyperlink information, is proposed. Finally, an improved caching proxy mechanism based upon these ideas is shown.","PeriodicalId":267002,"journal":{"name":"Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1996-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134624299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust and distributed genetic algorithm for ordering problems","authors":"Anup Kumar, A. Srivastava, A. Singru, R. K. Ghosh","doi":"10.1109/HPDC.1996.546195","DOIUrl":"https://doi.org/10.1109/HPDC.1996.546195","url":null,"abstract":"The paper presents a distributed genetic algorithm implementation for obtaining good quality consistent results for different ordering problems. Most importantly, the solution found by the proposed Distributed GA is not only of high quality but also robust and does not require fine tuning of the probabilities of crossover and mutation. In addition, implementation of the Distributed GA is simple and does not require the use of any specialized, expensive hardware. Fault tolerance has also been provided by dynamic reconfiguration of the distributed system in the event of a process or machine failure. The effectiveness of using a simple crossover scheme with Distributed GA is demonstrated by solving three variations of the Traveling Salesman Problem (TSP).","PeriodicalId":267002,"journal":{"name":"Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1996-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115911781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed job scheduling in SCI local-area multiprocessors","authors":"S. Agasaveeran, Qiang Li","doi":"10.1109/HPDC.1996.546231","DOIUrl":"https://doi.org/10.1109/HPDC.1996.546231","url":null,"abstract":"Local Area Multiprocessors (LAMP) is a network of personal workstations with distributed shared physical memory provided by high performance technologies such as SCI. LAMP is more tightly coupled than the traditional local area networks (LAN) but is more loosely coupled than the bus based multiprocessors. The paper presents a distributed scheduling algorithm which exploits the distributed shared memory in SCI-LAMP to schedule the idle remote processors among the requesting workstations it considers fairness by allocating remote processing capacity to the requesting workstations based on their priorities according to the decay-usage scheduling approach. The performance of the algorithm in scheduling both sequential and parallel jobs is evaluated by simulation. It is found that the higher priority nodes achieve faster job response times and higher speedups than that of the lower priority nodes. Lower scheduling overhead allows finer granularity of remote processors sharing than in LAN.","PeriodicalId":267002,"journal":{"name":"Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1996-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115769987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Braun, Isabelle Chrisment, C. Diot, F. Gagnon, Laurent Gautier
{"title":"ALFred, a protocol compiler for the automated implementation of distributed applications","authors":"T. Braun, Isabelle Chrisment, C. Diot, F. Gagnon, Laurent Gautier","doi":"10.1109/HPDC.1996.546216","DOIUrl":"https://doi.org/10.1109/HPDC.1996.546216","url":null,"abstract":"This paper describes the design and the prototyping of a compiling tool for the automated implementation of distributed applications: ALFred. This compiler starts from the formal specification of an application written in ESTEREL and then integrates end-to-end communication functions tailored to the application characteristics (described in the specification); it finally produces a high performance implementation. The paper describes the communication architecture associated with the approach. The compiler consists of a control compiler, also called ALF compiler, and a data manipulation compiler (the ILP compiler) that combines data manipulation functions in an efficient way (the ILP loop). The ALFred compiler has been designed to allow the development and the analysis of non-layered high performance communication architectures based on ALF and ILP.","PeriodicalId":267002,"journal":{"name":"Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1996-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126142965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Commodity clusters: performance comparison between PCs and workstations","authors":"R. Carter, John Laroco, R. Armstrong","doi":"10.1109/HPDC.1996.546199","DOIUrl":"https://doi.org/10.1109/HPDC.1996.546199","url":null,"abstract":"DAISy (Distributed Array of Inexpensive Systems) is a 16 node PC cluster running a full Unix compatible operating system. The network media used includes standard 10 Mb/s (10BASE-2) Ethernet (used for client node NFS mounts and any client node interactive work users find necessary), and, switched 100 Mbs/(100BASE-TX) Fast Ethernet (used for user program message passing traffic). The DAISy cluster is used to investigate the viability of commodity PC technology to perform computation of scientific and engineering problems traditionally performed on \"supercomputers\", and more recently high performance RISC workstations and clusters of RISC workstations. Performance analysis of the various single node subsystems were carried out along with performance analysis of the cluster as a whole on a number of parallel applications. The results show that the current Pentium 90 MHz CPU and motherboards used are well within that of many low end workstations offered by traditional workstation vendors.","PeriodicalId":267002,"journal":{"name":"Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1996-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126220759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lisa D. Nicklas, R. W. Atkins, Sanjeev Setia, Pearl Y. Wang
{"title":"A parallel solution to the cutting stock problem for a cluster of workstations","authors":"Lisa D. Nicklas, R. W. Atkins, Sanjeev Setia, Pearl Y. Wang","doi":"10.1109/HPDC.1996.546223","DOIUrl":"https://doi.org/10.1109/HPDC.1996.546223","url":null,"abstract":"The paper describes the design and implementation of a solution to the constrained 2D cutting stock problem on a cluster of workstations. The constrained 2D cutting stock problem is an irregular problem with a dynamically modified global data set and irregular amounts and patterns of communication. A replicated data structure is used for the parallel solution since the ratio of reads to writes is known to be large. Mutual exclusion and consistency are maintained using a token based lazy consistency mechanism, and a randomized protocol for dynamically balancing the distributed work queue is employed. Speedups are reported for three benchmark problems executed on a cluster of workstations interconnected by a 10 Mbps Ethernet.","PeriodicalId":267002,"journal":{"name":"Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1996-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128080496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}