{"title":"Low cost parallel solutions for the VRPTW optimization problem","authors":"O. Arbelaitz, Clemente Rodríguez Lafuente","doi":"10.1109/ICPPW.2001.951932","DOIUrl":"https://doi.org/10.1109/ICPPW.2001.951932","url":null,"abstract":"In the paper a parallelizable system based on simulated annealing to solve vehicle routing problem with time window (VRPTW) problems is described. The system consists of two optimization phases: a global one, and local one, both based on simulated annealing and parallizable. For the first phase different parallelization strategies are presented and evaluated. The importance of the co-operation among processors has been made clear: the communication of partial solutions improves the efficiency of optimal solution's search. Two algorithms, a synchronous one and an asynchronous one, stand out due to their good average behaviour related to the quality of solutions found, and due to their stability when augmenting the number of processors. The second phase has shown to be a great complement of the global search that permits to obtain a very fast and practical, low cost parallel system. This system has been able to reach the optimal solution published for the Solomon's benchmark in an 85% of the problems, and more important, the averages of any set of random executions are less than 5% worse than the best published.","PeriodicalId":93355,"journal":{"name":"Proceedings of the ... ICPP Workshops on. International Conference on Parallel Processing Workshops","volume":"341 1","pages":"176-181"},"PeriodicalIF":0.0,"publicationDate":"2001-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75939906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Restoration in IP over WDM optical networks","authors":"Hwajung Lee, Hongsik Choi, Hyeong-Ah Choi","doi":"10.1109/ICPPW.2001.951960","DOIUrl":"https://doi.org/10.1109/ICPPW.2001.951960","url":null,"abstract":"An important requirement in any high speed network is to ensure the network's survivability, i.e., the ability to provide reroutes of ongoing connections after the failure of network components. We consider the problem of embedding an IP layer topology in the WDM transport network layer with the objective of achieving the network's survivability in the IP layer. Specifically, we consider the problem of embedding an arbitrary IP layer topology in a WDM wavelength-routing ring network such that the IP topology remains connected under the presence of the failure of any link in the WDM layer.","PeriodicalId":93355,"journal":{"name":"Proceedings of the ... ICPP Workshops on. International Conference on Parallel Processing Workshops","volume":"28 1","pages":"263-268"},"PeriodicalIF":0.0,"publicationDate":"2001-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81183999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A differential bandwidth reservation policy for multimedia wireless networks","authors":"Sunho Lim, G. Cao, C. Das","doi":"10.1109/ICPPW.2001.951985","DOIUrl":"https://doi.org/10.1109/ICPPW.2001.951985","url":null,"abstract":"Provisioning of seamless communication for mobile terminal (MT) handoffs as well as guaranteeing a certain level of quality-of-service (QoS) to ongoing connections and new connections are critical issues in multimedia wireless networks. We present a differential bandwidth reservation(DBR) algorithm that can meet these requirements. For bandwidth reservation, the DBR scheme examines a sector of cells, which are located along the way to which the MT might move. The sector of cells are further divided into two regions depending on whether they have an immediate impact on the handoff or not. Two different bandwidth reservation policies are applied to cells in the two regions to optimize the connection dropping rate while maximizing the connection acceptance rate. Two possible MT movements are analyzed using the DBR mechanism. In the first case, no knowledge of the user's moving path is assumed to be available, while in the second case, prior knowledge of a user profile is used in bandwidth reservation, and is called the user profile-based DBR (UPDBR) algorithm. Simulation results indicate that the DBR algorithm is more adaptable to optimize the system performance in terms of call dropping rate compared to prior schemes. The UPDBR scheme can exploit the MT's moving path history for better bandwidth utilization as well as reduction in the number of communication messages compared to the DBR scheme. The overall results show that the proposed schemes not only provide better performance, but also exploit the current state of the system in optimizing different performance parameters.","PeriodicalId":93355,"journal":{"name":"Proceedings of the ... ICPP Workshops on. International Conference on Parallel Processing Workshops","volume":"10 1","pages":"447-452"},"PeriodicalIF":0.0,"publicationDate":"2001-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89266632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extracting SIMD parallelism from 'for' loops","authors":"V. Gustin, P. Bulić","doi":"10.1109/ICPPW.2001.951843","DOIUrl":"https://doi.org/10.1109/ICPPW.2001.951843","url":null,"abstract":"The need for multimedia applications has prompted the addition of a multimedia instruction set (MMX) to most existing general-purpose microprocessors. The introduction of short single-instruction multiple data (SIMD) i.e. \"vectorized\" instructions to the microprocessor \"scalar\" instruction set is supported by special hardware which enables the execution of one instruction on multiple data sets. Such a vectorized instruction set is primarily used in multimedia applications, and it seems likely that it will grow rapidly over the next few years. Thus on the one hand we have modern multimedia execution hardware and on the other we have the software and the general compilers which are not able to automatically exploit the multimedia instruction set. In addition, the compiler is not able to locate SIMD parallelism within a basic block. Our solution to these problems is to find statement candidates in the program written in the language C/C++ (as we mainly use this language), and to employ the SIMD instruction set in the easiest possible way. As we know that the compiler cannot be user-changed or modified, we can only extend the functionality of the program (compiler) by the use of specialised library routines or by macros. We prefer the latter. Why? We believe that the use of the macro library is faster than function calls, and we expect it to be simpler and more friendly for the user. The algorithm for identifying candidates for parallel processing (ICPP) is based on the fact that the program does not need any \"correction\" or \"adoption\" prior to being analysed andfinally to being translated into the SIMD instruction set. We define the macro library MacroVect.c as the substitution for the discovered statement candidates.","PeriodicalId":93355,"journal":{"name":"Proceedings of the ... ICPP Workshops on. International Conference on Parallel Processing Workshops","volume":"80 1","pages":"23-28"},"PeriodicalIF":0.0,"publicationDate":"2001-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91036359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Roig, F. Guirado, A. Ripoll, M. A. Senar, E. Luque
{"title":"Improving static scheduling using inter-task concurrency measures","authors":"C. Roig, F. Guirado, A. Ripoll, M. A. Senar, E. Luque","doi":"10.1109/ICPPW.2001.951975","DOIUrl":"https://doi.org/10.1109/ICPPW.2001.951975","url":null,"abstract":"A fundamental issue affecting the performance of parallel applications running on distributed systems is the assignment of tasks to processors. This paper shows the effectiveness in scheduling strategies derived from the use of the temporal behaviour of tasks included in the new TTIG (Temporal Task Interaction Graph) model. Experimentation was performed for a set of C+PVM applications running in a PVM platform. These applications were on the one hand synthetic programs whose communication topology matches certain well-known regular graph families such as trees, pipes and meshes and programs with irregular communication patterns. Additionally, a real image processing application was modelled and executed. The TTIG model has been shown to be effective in all cases compared with the classical TIG (Task Interaction Graph) and with the PVM default allocation scheme, and facilitates the development of new more efficient scheduling strategies.","PeriodicalId":93355,"journal":{"name":"Proceedings of the ... ICPP Workshops on. International Conference on Parallel Processing Workshops","volume":"57 1","pages":"375-381"},"PeriodicalIF":0.0,"publicationDate":"2001-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84791406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel complete remeshing for adaptive schemes","authors":"Juan J. Pombo, J. C. Cabaleiro, T. F. Pena","doi":"10.1109/ICPPW.2001.951853","DOIUrl":"https://doi.org/10.1109/ICPPW.2001.951853","url":null,"abstract":"In order to improve the convergence ratio and to automate Finite Element Methods, several strategies have been introduced. One of these is the adaptive scheme. This approximation presents limitations for parallelism since the generation of a conformal, valid and well conditioned finite element mesh is a time consuming task, and now it appears as a main task in each iteration of the adaptation procedure. This work is motivated by the use of the h-adaptive method in its most flexible form, where a complete reconstruction of the whole mesh have to be performed whenever a solution over the current mesh has been obtained and until error criteria are achieved. We focused on the problem of the fast generation of tetrahedral unstructured meshes in a parallel fashion over geometric models with some given refinement criteria. The chosen strategy implies the use of an octal tree, octree, as a key hierarchical data structure to guide the algorithm. The codes have been developed using the MPI library in a SGI Origin 200 multiprocessor.","PeriodicalId":93355,"journal":{"name":"Proceedings of the ... ICPP Workshops on. International Conference on Parallel Processing Workshops","volume":"123 1","pages":"73-78"},"PeriodicalIF":0.0,"publicationDate":"2001-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72732887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Designing parallel sparse matrix algorithms beyond data dependence analysis","authors":"H. Lin","doi":"10.1109/ICPPW.2001.951838","DOIUrl":"https://doi.org/10.1109/ICPPW.2001.951838","url":null,"abstract":"Algorithms are often parallelized based on data dependence analysis manually or by means of parallel compilers. Some vector/matrix computations such as the matrix-vector products with simple data dependence structures (data parallelism) can be easily parallelized. For problems with more complicated data dependence structures, parallelization is less straightforward. The data dependence graph is a powerful means for designing and analyzing parallel algorithm. However for sparse matrix computations, parallelization based on solely exploiting the existing parallelism in an algorithm does not always give satisfactory results. For example, the conventional Gaussian elimination algorithm for the solution of a tri-diagonal system is inherent sequential, so algorithms specially for parallel computation has to be designed. After briefly reviewing different parallelization approaches, a powerful graph formalism for designing parallel algorithms is introduced. This formalism will be discussed using a tri-diagonal system as an example. Its application to general matrix computations is also discussed and its power in designing parallel algorithms beyond the ability of data dependence analysis is shown.","PeriodicalId":93355,"journal":{"name":"Proceedings of the ... ICPP Workshops on. International Conference on Parallel Processing Workshops","volume":"26 1","pages":"7-13"},"PeriodicalIF":0.0,"publicationDate":"2001-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89962559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hot-potato routing algorithms for sparse optical torus","authors":"Risto T. Honkanen, M. Penttonen, V. Leppänen","doi":"10.1109/ICPPW.2001.951966","DOIUrl":"https://doi.org/10.1109/ICPPW.2001.951966","url":null,"abstract":"In this paper we present an optical network architecture and deflection (or hot potato) routing algorithms supporting efficient communication between n processor nodes in a shared memory parallel computer. The sparse optical torus network consists of an n/spl times/n torus, where processor nodes are situated diagonally, and routing nodes are optical deflection nodes of two inputs and two outputs. A design of optical deflection node is presented. Several routing algorithms, based on the greedy routing algorithm, are developed. By experiments and partial theoretical analyses these algorithms run efficiently on sparse optical torus.","PeriodicalId":93355,"journal":{"name":"Proceedings of the ... ICPP Workshops on. International Conference on Parallel Processing Workshops","volume":"13 1","pages":"302-307"},"PeriodicalIF":0.0,"publicationDate":"2001-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75409870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A general construction for nonblocking crosstalk-free photonic switching networks","authors":"F. Hwang, Wen-Dar Lin","doi":"10.1109/ICPPW.2001.951965","DOIUrl":"https://doi.org/10.1109/ICPPW.2001.951965","url":null,"abstract":"The graph representation G(M) of a multistage network M is well-known. C.T. Lea (1990) observed that link-disjoint paths in M correspond to node-disjoint paths in G(M). He proposed G(M) as a network by treating nodes as crossbars to transfer the none-disjoint property to the crosstalk-free property essential for photonic networks using directional couplers as components. However, such a network has its peculiarities and is not the commonly used type. In this paper we use the same principle to establish the crosstalk-free property for the popular Log/sub 2/(N,k,p) network.","PeriodicalId":93355,"journal":{"name":"Proceedings of the ... ICPP Workshops on. International Conference on Parallel Processing Workshops","volume":"1 1","pages":"297-301"},"PeriodicalIF":0.0,"publicationDate":"2001-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77338997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The improved conjugate gradient squared (ICGS) method on parallel distributed memory architectures","authors":"L. Yang, R. Brent","doi":"10.1109/ICPPW.2001.951924","DOIUrl":"https://doi.org/10.1109/ICPPW.2001.951924","url":null,"abstract":"For the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices, we propose an improved version of the Conjugate Gradient Squared method (ICGS) method. The algorithm is derived such that all inner products, matrix-vector multiplications and vector updates of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time of vector updates. Therefore, the cost of global communication on parallel distributed memory computers can be significantly reduced. The resulting ICGS algorithm maintains the favorable properties of the algorithm while not increasing computational costs. Data distribution suitable for both irregularly and regularly structured matrices based on the analysis of the non-zero matrix elements is also presented. Communication scheme is supported by overlapping execution of computation and communication to reduce mailing times. The efficiency of this method is demonstrated by numerical experimental results carried out on a massively parallel distributed memory system.","PeriodicalId":93355,"journal":{"name":"Proceedings of the ... ICPP Workshops on. International Conference on Parallel Processing Workshops","volume":"1 1","pages":"161-165"},"PeriodicalIF":0.0,"publicationDate":"2001-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88975004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}