{"title":"Runtime parallel incremental scheduling of DAGs","authors":"Minyou Wu, W. Shu, Yong Chen","doi":"10.1109/ICPP.2000.876171","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876171","url":null,"abstract":"A runtime parallel incremental DAG scheduling approach is described in this paper. A DAG is expanded incrementally, scheduled, and executed on a parallel machine. A DAG scheduling algorithm is parallelized to scale to large systems. In this approach, a large DAG can be executed without consuming large amount of memory space. Inaccurate estimation of task execution time and communication time can be tolerated. This runtime approach can also execute dynamic DAGs. Implementation of this parallel incremental system demonstrates the feasibility of this approach. Preliminary results show that it is superior to other approaches.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126610269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Domination and its applications in ad hoc wireless networks with unidirectional links","authors":"Jie Wu, Hailan Li","doi":"10.1109/ICPP.2000.876117","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876117","url":null,"abstract":"We consider an efficient distributed algorithm for determining a dominating and absorbant set of vertices (mobile hosts) in a given directed graph that represents an ad hoc wireless network with unidirectional links. This approach is based on the concept of dominating set in graph theory. A host /spl upsi/ is called a dominating neighbor (absorbant neighbor) of another host u if there is a directed edge (/spl upsi/, u) ((u, /spl upsi/)). A subset of vertices is dominating and absorbant if every vertex not in the subset has one dominating neighbor and one absorbant neighbor in the subset. A quick formation process of a dominating and absorbant set is given and this set can be easily updated when the network topology changes dynamically. Ideas for dominating-set-based routing in an ad hoc wireless network with unidirectional links are also given. The effectiveness of the approach is confirmed through a simulation study.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121815437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A scalable task duplication based scheduling algorithm for heterogeneous systems","authors":"S. Ranaweera, D. Agrawal","doi":"10.1109/ICPP.2000.876154","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876154","url":null,"abstract":"Optimal scheduling of tasks represented by a directed acyclic graph (DAG) onto a set of homogeneous processors, is a strong NP-hard problem. In this paper we introduce a scalable scheduling scheme called STDS for heterogeneous systems. This implies that tasks could potentially have different run times on different processors. The complexity of STDS is O(v/sup 2/) where v is the number of nodes in the task graph. Schedule length is primarily reduced by selected task duplication. Current task duplication based scheduling schemes are mostly done for homogeneous systems. Comparing the performance of STDS with BIL, another scheduling scheme for heterogeneous systems, it is observed that STDS obtained speed-ups of 6 to 40 generating shorter schedules when sufficient duplication can be carried out.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"174 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132226775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design and evaluation of smart disk architecture for DSS commercial workloads","authors":"G. Memik, M. Kandemir, A. Choudhary","doi":"10.1109/ICPP.2000.876149","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876149","url":null,"abstract":"The requirements for storage space and computational power of large-scale applications are increasing rapidly. Clusters seem to be the most attractive architecture for such applications, due to their low costs and high scalability. On the other hand, smart disk systems, with their large storage capacities and growing computational power are becoming increasingly popular. In this work, we compare the performance of these architectures with a single host-based system using representative queries from the Decision Support System (DSS) databases. We show how to implement individual database operations in the smart disk system and also show how to optimize the execution of the whole query by bundling frequently occurring operations together and executing the bundle in a single invocation. Besides decreasing the overall execution time, operation bundling also offers an easy-to-program and easy-to-use interface to access the data on smart disks. We also present a protocol for minimizing the communication time in the smart disk based system. To measure the response times, we have developed the DBsim, an accurate simulator which can simulate the database operations for the single host-based, cluster-based and smart disk based systems. Using this simulator; we illustrate that the smart disk architecture offers substantial benefits in terms of overall query execution times of the TPC-D benchmark suite. In particular, the average response time of the smart disk architecture for the representative queries from the TPC-D benchmark in our base configuration is 71% smaller than the response time on the single host-based system and 4.2% smaller than the response time on the fastest cluster architecture. We also demonstrate the effectiveness of the operation bundling.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132448168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Xuan, Chengzhi Li, R. Bettati, Jianer Chen, Wei Zhao
{"title":"Utilization-based admission control for real-time applications","authors":"D. Xuan, Chengzhi Li, R. Bettati, Jianer Chen, Wei Zhao","doi":"10.1109/ICPP.2000.876137","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876137","url":null,"abstract":"In this paper, we present a methodology to use utilization-based admission control in guaranteed real-time communication in a scalable fashion. We make admission control scalable by using a configuration-time test to determine a safe utilization level of servers. Admission control at run-time then is reduced to simple utilization tests on the servers along the path of the new flow. Furthermore, we discuss how appropriate route selection improve utilization levels, design a safe route selection heuristic algorithm to achieve high utilization of resources, and derive two bounds on the maximum utilization level for given traffic in a network. We compare the results of our route selection heuristics with that of a shortest-path based algorithm, and find that our heuristics can achieve a much higher maximum utilization level than that of the shortest-path based algorithm.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117250693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A balanced approach to high-level verification: performance trade-offs in verifying large-scale multiprocessors","authors":"D. Abts, Mike Roberts, D. Lilja","doi":"10.1109/ICPP.2000.876167","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876167","url":null,"abstract":"A single node of a modern scalable multiprocessor consists of several ASICs comprising tens of millions of gates. This level of integration and complexity imposes an enormous onus on the verification process. A variety of tools, ranging from discrete-event logic simulation to formal model checking, can be used to attack this problem. Unfortunately, conventional simulation techniques, with their primitive interface to the hardware (i.e. test vectors), are inadequate tools for reasoning about the correctness of complex architectural features, such as cache coherence protocols and memory consistency models. Similarly, model checkers offer very limited utility on such large designs. We have previously proposed a novel verification framework, called Raven, that addresses many of these challenges. In this paper we examine the performance implications of verifying systems at higher levels of abstraction. A detailed performance analysis is conducted to compare this higher-level approach against an equivalent Verilog test bench. We establish lower and upper bounds on the performance of the Raven environment executing on a single-processor on a set of distributed processors, and on a shared-memory multiprocessor.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127942981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Min-Te Sun, Wu-chi Feng, T. Lai, Kentaro Yamada, H. Okada, K. Fujimura
{"title":"GPS-based message broadcasting for inter-vehicle communication","authors":"Min-Te Sun, Wu-chi Feng, T. Lai, Kentaro Yamada, H. Okada, K. Fujimura","doi":"10.1109/ICPP.2000.876143","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876143","url":null,"abstract":"Intelligent Transportation Systems (ITS) have become a focus for many countries. To achieve ITS, Inter Vehicle Communication (IVC) is required for the exchange and distribution of data such as congestion or emergency information. If this communication can be done without fixed infrastructure, the systems can be deployed quickly and on a larger scale. Ad hoc networking technologies are one such technology to achieve IVC. However, if generic ad hoc network solutions are applied directly to IVC, performance can degrade quickly as the system scales particularly for broadcast type messages. In this paper we propose two new broadcast protocols that reduce bandwidth required for broadcast communication by taking advantage of a vehicle's highly directional movement and Global Positioning Information. To show the performance of our new protocols, we compare our approach with generic ad hoc broadcasting techniques. Our results show that it is possible to achieve several hundred percent improvement of bandwidth utilization with very slight sacrifice of reachability.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131104461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A scalable parallel subspace clustering algorithm for massive data sets","authors":"H. Nagesh, Sanjay Goil, A. Choudhary","doi":"10.1109/ICPP.2000.876164","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876164","url":null,"abstract":"Clustering is a data mining problem which finds dense regions in a sparse multi-dimensional data set. The attribute values and ranges of these regions characterize the clusters. Clustering algorithms need to scale with the data base size and also with the large dimensionality of the data set. Further, these algorithms need to explore the embedded clusters in a subspace of a high dimensional space. However the time complexity of the algorithm to explore clusters in subspaces is exponential in the dimensionality of the data and is thus extremely compute intensive. Thus, parallelization is the choice for discovering clusters for large data sets. In this paper we present a scalable parallel subspace clustering algorithm which has both data and task parallelism embedded in it. We also formulate the technique of adaptive grids and present a truly unsupervised clustering algorithm requiring no user inputs. Our implementation shows near linear speedups with negligible communication overheads. The use of adaptive grids results in two orders of magnitude improvement in the computation time of our serial algorithm over current methods with much better quality of clustering. Performance results on both real and synthetic data sets with very large number of dimensions on a 16 node IBM SP2 demonstrate our algorithm to be a practical and scalable clustering technique.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127294115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multilayer VLSI layout for interconnection networks","authors":"C. Yeh, Emmanouel Varvarigos, B. Parhami","doi":"10.1109/ICPP.2000.876069","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876069","url":null,"abstract":"Current VLSI technology allows more than two wiring layers and the number is expected to rise in future. In this paper we show that, by designing VLSI layouts directly for an L-layer model, the layout area for a variety of networks can be reduced by a factor of about (L/2)/sup 2/ compared to the layout area required under a 2-layer model, and the volume and maximum wire length can be reduced by a factor of about L/2, leading to considerably lower cost and/or higher performance. The proposed layouts for k-ary n-cubes, hypercubes, butterfly networks, cube-connected cycles (CCC), folded hypercubes, generalized hypercubes, k-ary n-cube cluster-c, hierarchical hypercube networks, reduced hypercubes, hierarchical swap networks, and indirect swap networks, are the best layouts reported for these networks thus far and are optimal within a small constant factor under both the Thompson model and the multilayer grid model. All of our layouts are optimally scalable in that we can allow each network node to occupy the largest possible area (e.g., o(N/L/sup 2/) for hypercubes) without increasing the leading constant of the layout area, volume, or maximum wire length.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129204750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A fault-tolerant adaptive and minimal routing approach in n-D meshes","authors":"Jie Wu","doi":"10.1109/ICPP.2000.876159","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876159","url":null,"abstract":"In this paper a sufficient condition is given for minimal routing in n-dimensional (n-D) meshes with faulty nodes contained in a set of disjoint fault regions. It is based on an early work of the author on minimal routing in low dimension meshes (such as 2-D meshes with faulty blocks). Unlike many traditional models that assume all the nodes know global fault distribution, our approach is based on the concept of limited global fault information. First, a fault model called fault region is used in which all faulty nodes in the system are contained in a set of disjoint regions. Fault information is coded in a 2n-tuple called extended safety level associated with each node of an n-D mesh to support minimal routing. Specifically, we study the existence of minimal paths at a given source node, limited distribution of fault information, minimal routing, and deadlock-free routing. Our results show that any minimal routing that is partially adaptive can still be applied as long as the destination node meets a certain safety condition. A dynamic planar-adaptive routing scheme is presented that offers better fault tolerance and adaptivity than the regular planar-adaptive routing scheme in n-D meshes.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131384969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}