{"title":"Synthetic workload generation for load-balancing experiments","authors":"P. Mehra, B. Wah","doi":"10.1109/M-PDT.1995.414840","DOIUrl":"https://doi.org/10.1109/M-PDT.1995.414840","url":null,"abstract":"The Dynamic Workload Generator accurately replays measured workloads in the presence of competing foreground tasks. We have used this workload-generation tool to predict the relative speedups of different sites for an incoming task in our prototype system, using only the resource-utilization patterns observed before the task arrives. Our results show that the load-balancing policies learned by our system effectively exploit idle resources of a distributed computer system.Dynamic Workload Generator is a facility for generating realistic and reproducible synthetic workloads for use in load-balancing experiments. For such experiments, the generated workload must not only mimic the highly dynamic resource-utilization patterns found on today's distributed systems but also behave as a real workload does when test jobs run concurrently with it. The latter requirement is important in testing alternative load-balancing strategies, a process that requires running the same job multiple times, each time at a different site but under an identical network-wide workload.Parts of DWG are implemented inside the operating-system kernel and have complete control over the utilization levels of four key resources: CPU, memory, disk, and network. Besides accurately replaying network-wide load patterns recorded earlier, DWG gives up a fraction of its resources each time a new job arrives and reclaims these resources upon job completion. Pattern-doctoring rules implemented in DWG control the latter operation. This article presents DWG's architecture, its doctoring rules, systematic methods for adjusting and evaluating doctoring rules, and experimental results on a network of Sun workstations.","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132931382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Partitioning unstructured computational graphs for nonunifor","authors":"M. Kaddoura, Chao-Wei Ou, S. Ranka","doi":"10.1109/M-PDT.1995.414844","DOIUrl":"https://doi.org/10.1109/M-PDT.1995.414844","url":null,"abstract":"In heterogeneous computing environments, computational resources can have a nonuniform distribution that changes over time. To execute in such an environment, many irregular and loosely synchronous data-parallel applications must be carefully mapped. This article examines algorithms that provide this mapping by efficiently partitioning the computational graphs of these applications.Heterogeneity has become commonplace in high-performance computing environments. In the future most computing environments will consist of a cluster of nodes connected by a high-speed interconnection network. Node architectures will include high-performance SIMD and MIMD parallel computers as well as numerous high-performance workstations.In a heterogeneous environment, users can pool many computational resources to create a large virtual machine. This environment can be nonuniform -- that is, the machines or processors can have different computational powers. However, the pool of resources might change over the computation's lifetime because of machine failures or differing use patterns. It should be possible to add or remove resources without significantly affecting the other machines or changing the existing software. In such an adaptive environment, an individual machine could either be dedicated to a single user's computation or shared by users. The former strategy has the advantage that each machine has static computing capability, while the latter has the advantage of a higher rate of use.In this article we'll examine the mapping requirements for the parallelization of a large class of irregular and loosely synchronous data-parallel applications on nonuniform and adaptive environments. The computational structure of these applications can be described as a computational graph. In such a graph, nodes represent computational tasks and edges describe the communication between tasks.For many applications, the graph's vertices correspond to 2D and 3D coordinates, and the interaction between computations is limited to physically proximate vertices. Recursive coordinate bisection, index-based mapping, and recursive spectral bisection can exploit these properties to partition such applications. Essentially, these algorithms cluster proximate points together to form a partition such that the numbers of vertices attached to every partition are equal.Other researchers have used these algorithms to map graphs onto uniform parallel machines. We'll evaluate how the algorithms partition computational graphs on a simulation of a cluster of machines constituting a static, nonuniform environment. (In a static environment, computational resources are fixed throughout the completion of all tasks.) The algorithms assume that an interconnection network connects all the processors and that the cost of unit communication is the same between all the processors. (A bus is an example of such a network.) Although our algorithms specifically target a network-connected cluster of workstation","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133959389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simulating and analyzing railway interlockings in ExSpect","authors":"T. Basten, R. Bol, M. Voorhoeve","doi":"10.1109/M-PDT.1995.414843","DOIUrl":"https://doi.org/10.1109/M-PDT.1995.414843","url":null,"abstract":"This study evaluates the ability of ExSpect, a toolkit for simulating and analyzing complex distributed systems using colored Petri nets, to analyze ISL specifications for railway interlockings.A railway interlocking--which is designed to guarantee the safety of train movements--is an extremely complex distributed system. The behavior of such a system - and thus its correctness - is hard to understand and even more difficult to analyze. Recognizing that verification of safety requirements in such a system would not be possible without a way to formally describe system behavior, the Dutch railway company, Nederlandse Spoorwegen, designed a set of formal languages, called the Interlocking Specification Language, also known as Euris.Engineers at NS envisioned that ISL would let them describe and simulate interlocking behavior, verify safety requirements, and optimize interlocking behavior. This in turn could lead to the creation of an infrastructure that would allow more flexible train schedules.However, although ISL is an important step toward a more formal approach to building and maintaining interlockings, it is not suitable for verifying safety requirements because it lacks a firm mathematical basis. The study described here, conducted by the Eindhoven University of Technology in cooperation with NS, is a first step toward the simulation and verification of ISL specifications that is grounded in mathematical theory.As part of the study, we translated a small part of an ISL specification into the graphical and functional language used by the ExSpect toolkit. ExSpect, which is short for Executable Specification tool, is a graphical specification and simulation package developed at the university and commercially available from Bakkenist Management Consultants. It is a general-purpose tool, based on the theory of Petri nets, that combines a graphical user interface for specifying and simulating many types of distributed systems with analysis tools for verifying the properties of such systems.The goals of the study were to investigate to what extent NS engineers could use ExSpect to improve simulation and verification in ISL and to evaluate the strengths and weaknesses of ExSpect in an interesting real-world application. Many constructs in ISL map almost directly to ExSpect constructs. Thus, the study also laid the foundation for an ISL-to-ExSpect compiler.The study revealed that ExSpect has many advantages over ISL in simulation. It also revealed that we cannot yet verify any safety properties of an interlocking. First, it is not clear exactly what the safety requirements of an interlocking are, as they are described in ISL. Second, and more compelling, a railway interlocking specification is far too complex for formal verification with current technology.We did, however, learn some interesting things about ExSpect's abilities and gained much insight into possible extensions.","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"291 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133692702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sheue-Ling Chang, D. Du, J. Hsieh, R. Tsang, Mengjou Lin
{"title":"Enhanced PVM communications over a high-speed LAN","authors":"Sheue-Ling Chang, D. Du, J. Hsieh, R. Tsang, Mengjou Lin","doi":"10.1109/M-PDT.1995.414841","DOIUrl":"https://doi.org/10.1109/M-PDT.1995.414841","url":null,"abstract":"Performance results of PVM over a local ATM network show the availability of much greater communication bandwidth over traditional LANs such as Ethernet. Application-level performance, however, still lags far behind the capabilities of the physical medium. Realizing the full potential of high-speed networks, therefore, will require further improvements in both hardware and software components of network I/O subsystems.Emulating a parallel machine via a collection of heterogeneous, independent hosts and a general-purpose local-area network has obvious advantages, including cost-effectiveness and very large aggregate processing power and memory. However, the ability of most general-purpose LANs to support communication-intensive parallel applications is debatable. Today, with the emergence of several high-speed, switch-based networks, such as High-Performance Parallel Interface (Hippi), Fibre Channel, and Asynchronous Transfer Mode (ATM), networks that effectively support communication-intensive parallel applications may soon become a reality.Network-based computing offers several advantages. First, independent, commercially available systems and a general LAN can readily incorporate advances in processor and network technology. Second, due to the large memory and processing power available in the aggregate collection of individual host systems, very large applications can execute on a collection of relatively low-priced host systems. Third, the underlying network can support high-speed input/output to applications, for instance, by using disk arrays.One factor that previously fueled much skepticism about the feasibility of network-based parallel computing was the limitations imposed by using traditional LANs, such as Ethernet, as the system interconnect. For many typical network applications that require only occasional file transfers, or infrequent small amounts of data to be transmitted between workstations, an Ethernet-based cluster of workstations will suffice. However, for network-based applications, such as communication-intensive, course-grain parallel applications, traditional networks such as Ethernet simply cannot provide adequate performance. Thus, for this study, we chose a high-speed transport mode as the supporting communication medium.","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"259 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133829996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Reed, C. Catlett, A. Choudhary, D. Kotz, M. Snir
{"title":"Parallel I/O: Getting ready for prime time","authors":"D. Reed, C. Catlett, A. Choudhary, D. Kotz, M. Snir","doi":"10.1109/mpdt.1995.9283668","DOIUrl":"https://doi.org/10.1109/mpdt.1995.9283668","url":null,"abstract":"During the International Conference on Parallel Processing, held August 15-19, 1994, we convened a panel to discuss the state of the art in parallel I/0, tools and techniques to address current problems, and challenges for the future. The following is an edited transcript of that panel.","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115397301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Inherently stable real-time priority list dispatchers","authors":"A. Krings, R. Kieckhafer, J. Deogun","doi":"10.1109/88.345961","DOIUrl":"https://doi.org/10.1109/88.345961","url":null,"abstract":"This paper concerns the problem of scheduling instability, where reducing the duration of one or more tasks delays the starting time of another task. This can make it difficult or impossible to guarantee real-time deadlines. Unfortunately, simulating schedules with all possible variations in the durations of all tasks is in most cases an intractable problem. Therefore, task dispatching must be provably stable for all permissible variations in durations. To alleviate scheduling instability, we have developed a new class of provably stable runtime dispatchers that are less restrictive than known stabilization algorithms. Extensive simulations show that even the simple, low-overhead dispatchers perform remarkably well. These results suggest that \"simple is better\". Developers of hard real-time systems can implement fast, simple low-overhead runtime task dispatchers that guarantee stability but still deliver near-optimal performance.<<ETX>>","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116130830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Bass, A. Browne, M. S. Hajji, D. G. Marriott, P. Croll, P. Fleming
{"title":"Automating the development of distributed control software","authors":"J. Bass, A. Browne, M. S. Hajji, D. G. Marriott, P. Croll, P. Fleming","doi":"10.1109/88.345964","DOIUrl":"https://doi.org/10.1109/88.345964","url":null,"abstract":"The Development Framework translates application-specific system specifications into parallel, hard real-time implementations, using methods that are both familiar to developers and optimal for the application. The Development Framework approach applies CASE tools-as well as several new tools-to the development of distributed systems, so designers can concentrate on the control-engineering aspects of their systems. The approach addresses three development phases: specification, software design, and implementation. In the specification phase, the control engineer refines behavioral requirements through simulation and analysis, thereby verifying that the system meets its functional requirements prior to implementation. Once the simulated behavior is satisfactory, the specified behavior is translated into a design. Finally, our tools produce source code, either by automatically generating it or by drawing it from a library. We describe the new and existing tools we apply during each phase. We then demonstrate our approach using an example of a linearized roll-pitch-yaw autopilot and airframe model.<<ETX>>","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128426546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reducing the variance of point-to-point transfers for parallel real-time programs","authors":"R. Mraz","doi":"10.1109/88.345963","DOIUrl":"https://doi.org/10.1109/88.345963","url":null,"abstract":"Investigations that analyze the time an operating system takes to schedule, interrupt and \"context-switch\" to another process or job have helped developers produce highly optimized and tuned operating systems that can provide more than 99% sustained processor use for most uniprocessor applications. However, when these operating systems are installed on CPUs that are interconnected with a low-latency (user-space) communication mechanism, large variances typically occur in the time it takes to send a point-to-point message. In this article, we examine how to reduce the difference between worst-case and average-case message latency that can contribute to variance in fine-grain parallel programs. Changing how the operating system handles interrupt processing and scheduling can greatly reduce the difference between these latencies, thus increasing a program's performance.<<ETX>>","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121420799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed, real-time control of structurally flexible manipulators","authors":"M. Sever, G. D’Eleuterio","doi":"10.1109/88.345960","DOIUrl":"https://doi.org/10.1109/88.345960","url":null,"abstract":"Parallel processing facilitates the real-time implementation of an end-effector velocity controller for elastic robotic manipulators. The approach has potential application in teleoperation. We have developed an algorithm that controls end-effector velocity of large robotic manipulators whose size-to-weight ratio renders them structurally flexible. The controller takes: desired end-effector velocity as input. It seeks to minimize the velocity tracking error in the end-effector by concentrating its effort on the end-effector while letting the manipulator's links deform. The control strategy relies on a feedforward-feedback concept, designed around an augmented model of the manipulator dynamics, which includes derivatives of the control input in addition to integrator states that minimize tracking error. We implemented the algorithm for parallel processing on a multiprocessor system. The controller's performance agrees well with computer simulations.<<ETX>>","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"260 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122685842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Communication issues in parallel computing across ATM networks","authors":"Chengchang Huang, P. McKinley","doi":"10.1109/88.345959","DOIUrl":"https://doi.org/10.1109/88.345959","url":null,"abstract":"To support distributed parallel computing across asynchronous transfer mode networks, the interplay between communications software and the network architecture is critical. In particular, we can improve communication performance significantly by efficiently designing collective communication operations, such as multicast and reduction.<<ETX>>","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126225569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}