{"title":"The double scheme: deadlock-free dynamic reconfiguration of cut-through networks","authors":"Ruoming Pang, T. Pinkston, J. Duato","doi":"10.1109/ICPP.2000.876160","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876160","url":null,"abstract":"Network-based computing systems often require the ability to reconfigure the routing algorithm to reflect changes in network topology if and when those changes occur. The process of reconfiguring a network's routing capabilities may lead to deadlock if not handled properly. In this paper we propose efficient and deadlock-free dynamic reconfiguration techniques that are generically applicable to distributed routing algorithms and networks, including those which use wormhole switching. The proposed techniques do not impede the transmission of packets during the reconfiguration process, thus providing increased network availability and quality-of-service (QoS) support as compared to traditional techniques based on static reconfiguration.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124862723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel channel-adaptive uplink access control protocol for nomadic computing","authors":"Yu-Kwong Kwok, V. Lau","doi":"10.1109/ICPP.2000.876174","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876174","url":null,"abstract":"We consider the uplink access control problem in a mobile computing system, which is based on a cellular phone network in that a user can use the mobile device to transmit voice or file data. This resource management problem is important because efficient solution to uplink access control is critical for supporting a large user population with a reasonable level of quality of service (QoS). While there are a number of recently proposed protocols for uplink access control, these protocols possess a common drawback in that they do not exploit well the burst error properties, which are inevitable in a wireless communication system. In this paper, we propose a novel TDMA-based uplink access protocol, which employs a channel state dependent allocation strategy. Our protocol is motivated by two observations: (1) when channel state is bad, the throughput is low due to large amount of FEC (forward error correction) or excessive ARQ (automatic repeated request) is needed; and (2) because of (1), much of the mobile device's energy is wasted. The proposed protocol works closely with the underlying physical layer in that through observing the channel state information (CSI) of each mobile user, the MAC protocol first segregates a set of users with good CSI from requests gathered in the request contention phase of an uplink frame. The protocol then judiciously allocates channel bandwidth to contending users based on their channel conditions. Simulation results indicate that the proposed protocol considerably outperforms five state-of-the-art protocols in terms of packet loss, delay, and throughput.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125266680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ADAPT: Automated De-coupled Adaptive Program Transformation","authors":"Michael J. Voss, R. Eigenmann","doi":"10.1109/ICPP.2000.876107","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876107","url":null,"abstract":"Dynamic program optimization offers performance improvements far beyond those possible with traditional compile-time optimization. These gains are due to the ability to exploit both architectural and input data set characteristics that are unknown prior to execution time. In this paper, we propose a novel framework for dynamic program optimization, ADAPT (Automated De-coupled Adaptive Program Transformation), that builds on the strengths of existing approaches. The key to our framework is the de-coupling of the dynamic compilation of new code variants from the dynamic selection of these variants at their points of use. This allows code generation to occur concurrently with program execution, removing dynamic compilation overheads from the critical path. We present a compilation system, based on the Polaris optimizing compiler, that automatically applies this framework to general \"plugged-in\" optimization techniques. We evaluate our system on three programs from the SPEC floating point benchmark suite by dynamically applying loop distribution, loop unrolling, loop tiling and automatic parallelization. We show that our techniques can improve performance by as much as 70% over statically optimized code.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124603881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Partial resolution in data value predictors","authors":"Toshinori Sato, I. Arita","doi":"10.1109/ICPP.2000.876078","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876078","url":null,"abstract":"Recently, the practice of speculation in resolving data dependences has been studied as a means of extracting more instruction level parallelism (ILP). An outcome of an instruction is predicted by value predictors. The instruction and its dependent instructions can be executed simultaneously, thereby exploiting ILP aggressively. One of the serious hurdles for realizing data speculation is huge hardware budget of the predictors. In this paper, we investigate a technique reducing the budget by employing partial resolution, using fewer tag address bits than necessary to uniquely identify every instruction. Simulation results show only two tag bits are enough for achieving performance improvement comparable to full resolution, saving the hardware budget of value predictors substantially.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121422559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Haldar, A. Nayak, Abhay Kanhere, P. Joisha, N. Shenoy, A. Choudhary, P. Banerjee
{"title":"Match virtual machine: an adaptive runtime system to execute MATLAB in parallel","authors":"M. Haldar, A. Nayak, Abhay Kanhere, P. Joisha, N. Shenoy, A. Choudhary, P. Banerjee","doi":"10.1109/ICPP.2000.876100","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876100","url":null,"abstract":"MATLAB is one of the most popular languages for desktop numerical computations as well as for signal and image processing applications. Applying parallel processing techniques to improve performance of MATLAB codes has been the goal of many recent works. Most current frameworks require the user to specify parallelism and/or information regarding type/shape of the variables, thereby sacrificing the user friendliness which is one of the most popular MATLAB features. Other systems work on a restricted subset of MATLAB, thereby limiting the class of applications MATLAB can support. We present a runtime system capable of executing MATLAB code in parallel without any user intervention. The runtime system performs automatic parallelization and type/shape inference of the code at runtime. A unique feature of the runtime system is its capability to automatically adapt to changes in the underlying architecture, making it particularly useful for systems where predicting performance statically is difficult. We present experimental results obtained for the runtime system running on SGI Origin2000 shared memory multiprocessor.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"14 17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127928358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aki W. Tomita, Naoki Watanabe, Y. Takamoto, S. Inohara, F. Maciel, Hiroaki Odawara, M. Sugie
{"title":"A scalable, cost-effective, and flexible disk system using high-performance embedded-processors","authors":"Aki W. Tomita, Naoki Watanabe, Y. Takamoto, S. Inohara, F. Maciel, Hiroaki Odawara, M. Sugie","doi":"10.1109/ICPP.2000.876147","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876147","url":null,"abstract":"As a scalable, cost-effective, and flexible solution for data-intensive systems, we are exploring active-network-storage (ANS), which is an array of ANS disk drives. The ANS drive improves flexibility by using a modular software design; that is, users can specify functions of the ANS drive by loading/unloading the corresponding modules on it. To keep the ANS drive cost-effective, users are allowed to choose whether native code modules or platform-independent Java-bytecode modules are executed on the drive. We forecast that a current high-performance embedded-processor is powerful enough to enable this modular design to be implemented and to provide a scalable, cost-effective, and flexible ANS system. We have confirmed our forecast by conducting an experiment with an ANS drive prototype with a 200 MHz embedded-processor running database sequential scanning and NFS, which are typical off-loaded functions with different characteristics. To evaluate scalability and cost-effectiveness of the ANS system, we estimated the throughput from measurements on our ANS prototype, and we compared it with the throughput that was measured on a 450 MHz Pentium II Xeon server. Our estimation indicates that the scan throughput of the ANS system increases up to 71 MB/s while that of the server saturates at 25 MB/s because of its CPU bottleneck. The NFS read/write throughputs of two ANS drives surpassed the server maximum throughputs.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115786401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lower bounds on precedence-constrained scheduling for parallel processors","authors":"Ivan D. Baev, W. Meleis, A. Eichenberger","doi":"10.1109/ICPP.2000.876172","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876172","url":null,"abstract":"We consider two general precedence-constrained scheduling problems that have wide applicability in the areas of parallel processing, high performance compiling, and digital system synthesis. These problems are intractable so it is important to be able to compute tight bounds on their solutions. A tight lower bound on makespan scheduling can be obtained by replacing precedence constraints with release and due dates, giving a problem that can be efficiently solved. We demonstrate that recursively applying this approach yields a bound that is provably tighter than other known bounds, and experimentally shown to achieve the optimal value at least 86.5% of the time over a synthetic benchmark. We compute the best known lower bound on weighted completion time scheduling by applying the recent discovery of a new algorithm for solving a related scheduling problem. Experiments show that this bound significantly outperforms the linear programming-based bound. We have therefore demonstrated that combinatorial algorithms can be a valuable alternative to linear programming for computing tight bounds on large scheduling problems.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123486412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ming-Hour Yang, Y. Tseng, Ming-Shian Jian, Chao Lin
{"title":"Multi-node multicast in three and higher dimensional wormhole tori and meshes with load balance","authors":"Ming-Hour Yang, Y. Tseng, Ming-Shian Jian, Chao Lin","doi":"10.1109/ICPP.2000.876067","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876067","url":null,"abstract":"This paper considers the multi-node multicast problem in a multi-dimensional wormhole-routed torus/mesh, where there are an arbitrary number of source nodes each intending to multicast a message to an arbitrary set of destinations. This problem requires a large amount of bandwidth, and thus typically incurs heavy contention and congestion. Evenly balancing the traffic load around the network is a critical issue to achieve good performance. We show how to use a network-partitioning approach to achieve this goal. Simulation results show significant improvement over existing results in 3D tori and meshes. This work is an extension of our earlier work (2000) from 2D tori/meshes to higher dimensional ones.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125780503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Load redundancy removal through instruction reuse","authors":"Jun Yang, R. Gupta","doi":"10.1109/ICPP.2000.876075","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876075","url":null,"abstract":"Instruction reuse techniques have been developed to detect and remove redundancy at runtime. By maintaining the execution history of an instruction, reuse techniques detect if a subsequent execution of an instruction will yield the same result as its previous execution, and if this is the case, the result is made available to dependent instructions without executing the instruction. This approach eliminates same instruction redundancy, that is, redundancy across different dynamic instances of the same static instruction. However, the main limitation of existing instruction reuse techniques is that they do not detect or eliminate different instruction redundancy, that is, redundancy across dynamic instances of statically distinct instructions. We present instruction reuse techniques for load redundancy removal that eliminate both same and different instruction redundancy. We first present a study that shows that in addition to significant levels of same instruction redundancy (average of 20%), load instructions also contain high levels (average of 35%) of different instruction redundancy arising at other load or store instructions. We also describe studies that characterize the behavior of the redundancy and develop a hardware implementation guided by this characterization. Our experiments show that our techniques yield IPC improvements of up to 11% and reduces off-chip traffic due to cache misses by as much as 32% for SPECint95 benchmarks.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125928666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Depth first search and location based localized routing and QoS routing in wireless networks","authors":"I. Stojmenovic, Mark Russell, B. Vukojevic","doi":"10.1109/ICPP.2000.876111","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876111","url":null,"abstract":"In a localized routing algorithm, node A currently holding the message forwards it based on the location of itself, its neighboring nodes and destination. We propose to use depth first search (DFS) method for routing decisions. Each node A, upon receiving the message for the first time, sorts all its neighbors according to a criteria such as their distance to destination and uses that order in DFS algorithm. It is the first localized algorithm that guarantees delivery for (connected) wireless networks modeled by arbitrary graphs, including inaccurate location information. We then propose the first localized QoS routing algorithm for wireless networks. It performs DFS routing algorithm after edges with insufficient bandwidth or insufficient connection time are deleted from the graph, and attempts to minimize hop count. This is also the first paper to apply GPS in QoS routing decisions, and to consider the connection time (estimated lifetime of a link) as a QoS criterion. The average length of measured QoS path in our experiments, obtained by DFS method, was between 1 and 1.34 times longer than the length of QoS path obtained by shortest path algorithm. The overhead is considerably reduced by applying the concept of internal nodes.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130757345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}