{"title":"Maintaining spatial data sets in distributed-memory machines","authors":"Susanne E. Hambrusch, A. Khokhar","doi":"10.1109/IPPS.1997.580982","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580982","url":null,"abstract":"Proposes a distributed data structure for maintaining spatial data sets on message-passing, distributed memory machines. The data structure is based on orthogonal bisection trees and it captures relevant characteristics of parallel machines. The operations we consider include insertion, deletion and range queries. We introduce parameters to control how much imbalance is tolerated at each processor and to specify the load to be achieved during balancing. When balancing, we first broadcast point counts of a data-dependent partition of the data. Based on this partition, we propose load balancing methods with different communication and computation requirements. We present initial experimental results for the Cray T3D.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"16 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133357362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Conflict-free access to multiple single-ported register files","authors":"S. M. Müller, U. Vishkin","doi":"10.1109/IPPS.1997.580974","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580974","url":null,"abstract":"Presents a novel static algorithm for mapping values to multiple register files. The algorithm is based on the edge-coloring of a bipartite graph. It at lows the migration of values among the register files to keep the number of RAMs as small as possible. By comparison with the register file design used in the Cydra 5 mini-supercomputer, our approach substantially reduces the number of RAMs. This reduction actually grows with the issue rate. For a system with an issue rate of 6 instructions per cycle, the cost (gate count) of the register files are already cut by half. On a numerical workload like the Livermore Loops, both designs achieve roughly the same performance.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123226697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Control schemes in a generalized utility for parallel branch-and-bound algorithms","authors":"Y. Shinano, K. Harada, R. Hirabayashi","doi":"10.1109/IPPS.1997.580966","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580966","url":null,"abstract":"Branch-and-bound algorithms are general methods applicable to various combinatorial optimization problems and parallelization is one of the most promising methods for improving these algorithms. Parallel branch-and-bound algorithm implementations can be divided into two types based on whether a central or a distributed control scheme is used. Central control schemes have reduced scalability because of bottleneck problems which are frequently encountered. In order to solve problem cases that cannot be solved with a sequential branch-and-bound algorithm distributed control schemes are necessary. However, compared to central control schemes, higher efficiency is not always achieved through the use of a distributed control scheme. A mixed control scheme is proposed, changing between the two different types of control schemes during execution. In addition, a dynamic load balancing strategy is applied in the distributed control scheme. Performance evaluation for three different cases is carried out: central, distributed and mixed control schemes. Several TSP instances from the TSPLIB are experimentally solved, using up to 101 workstations. The results of these experiments show that the mixed control scheme is one of the most promising control schemes and furthermore, the hybrid selection rule, which was introduced in the authors' previous work, has an advantage in parallel branch-and-bound algorithms.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125149158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid time synchronization implemented through special ring array for mesh or torus","authors":"Yuzhong Sun, Zhiwei Xu, Mingfa Zhu","doi":"10.1109/IPPS.1997.580957","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580957","url":null,"abstract":"In this paper, we present a new efficient hybrid time synchronization scheme for a mesh or torus interconnection networks, called ROCTS. ROCTS comprises two levels, one for the lower level that is implemented over a special high-speed ring array, one for the mesh or torus network. In ROCTS, the second network we construct is different from the past, which is a ring array with each ring not connected to any other. We can implement ROCTS in a distributed and fault-tolerant way. On the other hand, we combine the advantages of hardware and linear-envelope methods so that we can improve the performance of time synchronization even if the work load on a mesh or torus network is not uniform or even the blocked nodes occur.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125341243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MTIO. A multi-threaded parallel I/O system","authors":"S. More, A. Choudhary, Ian T Foster, Ming Q. Xu","doi":"10.1109/IPPS.1997.580928","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580928","url":null,"abstract":"Presents the design and evaluation of MTIO (Multi-Threaded Input/Output), a multi-threaded runtime library for parallel I/O. We extend the multi-threading concept to separate the computation and I/O tasks into two separate threads of control. Multi-threading in our design permits (a) asynchronous I/O even if the underlying file system does not support asynchronous I/O; (b) copy avoidance from the I/O thread to the compute thread by sharing address space; and (c) a capability to perform collective I/O asynchronously without blocking the compute threads. Further, this paper presents techniques for collective I/O which maximize load balance and concurrency while reducing communication overhead in an integrated fashion. Performance results on an IBM SP2 for various data distributions and access patterns are presented. The results show that there is a tradeoff between the amount of concurrency in I/O and the buffer size designated for I/O, and that there is an optimal buffer size beyond which the benefits of larger requests diminish due to large communication overheads.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124137141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Andresen, Tao Yang, D. Watson, Athanassios S. Poulakidas
{"title":"Dynamic processor scheduling with client resources for fast multi-resolution WWW image browsing","authors":"Daniel Andresen, Tao Yang, D. Watson, Athanassios S. Poulakidas","doi":"10.1109/IPPS.1997.580877","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580877","url":null,"abstract":"WWW based Internet information service has grown enormously during the last few years, and major performance bottlenecks have been caused by WWW server and Internet bandwidth inadequacies. Utilizing client site computing power and also multiprocessor support at the server site can substantially improve the system response time. We examine the use of scheduling techniques in monitoring and adapting to workload variation at client and server sites for supporting fast WWW image browsing. We provide both analytic and experimental results to identify the impact of system loads and network bandwidth on response times and demonstrate the effectiveness of our scheduling strategy.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126294627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A parallel Tabu search algorithm for the 0-1 multidimensional knapsack problem","authors":"S. Niar, A. Fréville","doi":"10.1109/IPPS.1997.580948","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580948","url":null,"abstract":"The 0-1 multidimensional knapsack problem (0-1 MKP) is an NP-complete problem. Its resolution for large scale instances requires a prohibitive processing time. In this paper we propose a new parallel meta-heuristic algorithm based on the Tabu search (TS) for the resolution of the 0-1 MKP. We show that, in addition to reducing execution time, parallel processing can perform automatically and dynamically the settings of some TS parameters. This last point is realized by analyzing the information given by cooperative parallel search processes, and by modifying the search processes during the execution. This approach introduces a new level where balancing between intensification and diversification can be realized.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131837676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Oblivious routing algorithms on the mesh of buses","authors":"K. Iwama, Eiji Miyano","doi":"10.1109/IPPS.1997.580986","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580986","url":null,"abstract":"An optimal [1.5N/sup 1/2/] lower bound is shown for oblivious routing on the mesh of buses, a two-dimensional parallel model consisting of N/sup 1/2//spl times/N/sup 1/2/ processors, N/sup 1/2/ row and N/sup 1/2/ column buses but no local connections between neighbouring processors. Many lower bound proofs for routing on mesh-structured models use a single instance (adversary) which includes difficult packet-movement. This approach does not work in our case; our proof is the first which exploits the fact that the routing algorithm has to cope with many different instances. Note that the two-dimensional mesh of buses includes 2N/sup 1/2/ buses and each processor can access two different buses. Apparently the three-dimensional model provides more communication facilities, namely, including 3N/sup 2/3/ buses and each processor can access three different buses. Surprisingly, however, the oblivious routing on the three-dimensional mesh of buses needs more time, i.e., /spl Omega/(N/sup 2/3/) steps, which is another important result of this paper.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130446119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DFRN: a new approach for duplication based scheduling for distributed memory multiprocessor systems","authors":"G. Park, B. Shirazi, J. Marquis","doi":"10.1109/IPPS.1997.580875","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580875","url":null,"abstract":"Duplication based scheduling (DBS) is a relatively new approach for solving multiprocessor scheduling problems. The problem is defined as finding an optimal schedule which minimizes the parallel execution time of an application on a target system. We classify DBS algorithms into two categories according to the task duplication method used. We then present our new DBS algorithm that extracts the strong features of the two categories of DBS algorithms. Our simulation study shows that the proposed algorithm achieves considerable performance improvement over existing DBS algorithms with equal or less time complexity. We analytically obtain the boundary condition for the worst case behavior of the proposed algorithm and also prove that the algorithm generates an optimal schedule for a tree structured input directed acyclic graph.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132857665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A study of the efficiency of shared attraction memories in cluster-based COMA multiprocessors","authors":"A. Landin, Mattias Karlgren","doi":"10.1109/IPPS.1997.580829","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580829","url":null,"abstract":"The performance of a COMA multiprocessor greatly depends on the efficiency of the large node caches, the attraction memories. When more than one processor share an attraction memory its behavior is changed. From experiments with program-driven simulation we have found that clustering may improve the performance of the attraction memory significantly. Traffic is reduced, and the miss rates are power for shared attraction memories. However clustering may introduce contention for the attraction memory that may ruin any potential performance gain from increased attraction memory hit rate. Provided enough local bandwidth, application. Execution can remain efficient at higher memory pressure in clustered systems than in systems with single processor nodes. At very high memory pressure some applications change behavior and start suffering from clustering. This is caused by conflict misses due to the relatively lower associativity of the shared attraction memory.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133311470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}