{"title":"Speculative parallel graph reduction of lambda calculus to deferred substitution form","authors":"Yong-Hack Lee, Suh-Hyun Cheon","doi":"10.1109/ICAPP.1997.651496","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651496","url":null,"abstract":"In a parallel graph reduction system, speculative evaluation can increase parallelism but waste machine resources by evaluating expression which may eventually be discarded. When a speculative task reduces a lambda expression to WHNF (Weak Head Normal Form), substitution can lead to unbounded growth of the graph size and require copy operation. This speculative task may be unnecessary. In that case the performance is affected by the overheads to terminate all tasks to be propagated from a speculative task and to refresh the memory cells to be allocated for copy operation. We propose a lambda form called DSF (Deferred Substitution Form) which substitution is deferred until a mandatory task will evaluate substitution. In a speculative task to DSF, since there is no substitution. It cannot grow the graph size and require copy operation. Therefore the overhead can be decreased when a expression reduced to DSF is eventually unnecessary. In addition we propose an evaluation model for DSF to increase the parallelism.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124164347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
João Paulo W. Kitajima, M. D. Resende, B. Ribeiro-Neto, N. Ziviani
{"title":"Distributed parallel generation of indices for very large text databases","authors":"João Paulo W. Kitajima, M. D. Resende, B. Ribeiro-Neto, N. Ziviani","doi":"10.1109/ICAPP.1997.651539","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651539","url":null,"abstract":"We propose a new algorithm for the parallel generation of suffix arrays for large text databases on high-bandwidth computer networks. Suffix arrays are structures used in full text indexing which support very powerful query languages. Our algorithm is based on a parallel indirect mergesort (it is not a simple mergesort procedure) and is compared with a well known sequential algorithm (which is very efficient running on a single machine). Although network-bounded, the parallel version is theoretically and experimentally a much better alternative when compared to the sequential version (which is I/O-bounded in disk).","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129317478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generating communication sets efficiently on data-parallel programs","authors":"Tsung-Chuan Huang, L. Shiu, Cherng-Haw Yu","doi":"10.1109/ICAPP.1997.651505","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651505","url":null,"abstract":"Generating local memory access sequences and communication sets efficiently is an important issue while compiling a data-parallel language into a SPMD (Single Program Multiple Data) code. Recently, several approaches have been presented; they are based on the case in which array references are distributed across arbitrary number of processors with arbitrary block sizes using block-cyclic distribution. Typically, in order to generate explicit communication sets, each node program has to scan over the local memory access sequences. In this paper, we focus on two cases. First, array references are aligned to a common template and this template is distributed across processors using block-cyclic distribution. Second, array references are distributed across the same number of processors with same block size. The first case is further classified into one-level and two-level mappings. We construct a block state graph to generate communication sets by scanning only a portion of local memory access sequence. In one-level mappings and the second case, we only need to scan the active elements among the first s local active blocks; while in two-level mappings, only need to scan the active elements among the first /spl alpha/*s local active blocks, where s is the stride of regular section and a is the stride of alignment function. As a result, the efficiency can be greatly improved.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134639136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High performance computing on networks of workstations through the exploitation of function parallelism","authors":"Yung-Lin Liu, Hau-Yang Cheng, C. King","doi":"10.1109/ICAPP.1997.651514","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651514","url":null,"abstract":"Parallel programs are often written in the SPMD (single-program-multiple-data) form for exploiting data parallelism in the applications. In this paper, we show that even in SPMD programs further parallelism can be extracted by considering the function parallelism in the programs. Exploiting function parallelism is especially important for parallel systems using the NOW (network of workstations) approach. This is because the high communication overhead in such systems can be hidden with explicit control over the function parallelism. In this paper we describe a general methodology for exploiting function parallelism in SPMD programs and discuss the considerations involved in realizing such parallelism with the multithreading facility supported by most workstations today. The resultant multithreaded parallel program is still coded in the SPMD form. We demonstrate the application of this technique to a PDE solver, which solves a system of linear equations using Jacobi relaxation. Experiments on an 8-node NOW confirm that the performance of an SPMD program can be improved further by exploiting its function parallelism.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123705138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Determination of an optimal processor allocation in the design of massively parallel processor arrays","authors":"D. Fimmel, R. Merker","doi":"10.1109/ICAPP.1997.651500","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651500","url":null,"abstract":"In this paper we consider the determination of allocation functions as a part of the design of massively parallel processor arrays for algorithms which can be represented as systems of uniform recurrence equations. The objective is to find allocation functions minimizing the necessary chip area for a hardware implementation of the processor array. We propose an algorithm approximately minimizing the number of processors under consideration of the necessary chip area needed to implement the processors of the processor array. The arising optimization problems can be solved using integer linear programming.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121287631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lazy decomposition: a novel technique to control parallel task granularity","authors":"Suntae Hwang, H. Cha","doi":"10.1109/ICAPP.1997.651511","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651511","url":null,"abstract":"This paper introduces a new mechanism for the exposure of large grain parallelism. The scheme performs lazy task creation; inlining all tasks provisionally and extracting parallelism from the inlined information later on demand. However, unlike other mechanisms, the further task demand is satisfied by the next evaluation stream rather than retrospectively reversing the inlining decision of the current stream. The scheme is called lazy decomposition because decomposition itself is throttled rather than just the extraction of a task. Lazy decomposition makes the serial section clearly separated from the parallel section in an evaluation tree for a particular function, and this allows the serial section to adopt a sequential algorithm. The performance improvement is significant in divide-and-conquer applications by adoption of sequential algorithms.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115290902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new heuristic algorithm based on GAs for multiprocessor scheduling with task duplication","authors":"T. Tsuchiya, T. Osada, T. Kikuno","doi":"10.1109/ICAPP.1997.651499","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651499","url":null,"abstract":"In this paper, we propose a new algorithm for scheduling parallel programs represented as directed acyclic graphs onto multiprocessors with communication delays. In such systems, task duplication is known as a useful technique for shortening the length of schedules. The proposed algorithm adopts several heuristics based on GAs as well as task duplication. To apply a GA to scheduling, we design chromosomes using list representation so that each chromosome can uniquely represent a schedule of tasks. We also design genetic operators to control the degree of replication of tasks. Through simulation studies for three kinds of parallel programs under various scheduling conditions, we compare the proposed algorithm with an established algorithm proposed by Kruatrachue. As a result, it is found that the new heuristic algorithm outperforms the previous algorithm especially when communication delays are relatively small.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126869005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A systolic architecture for sorting an arbitrary number of elements","authors":"S. Zheng, S. Olariu, M. C. Pinotti","doi":"10.1109/ICAPP.1997.651484","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651484","url":null,"abstract":"We propose a simple systolic VLSI sorting architecture whose main feature is the pipelined use of a sorting network of fixed I/O size p to sort an arbitrarily large data set of N elements. Our architecture is feasible for VLSI implementation and its time performance is virtually independent of the cost and depth of the underlying sorting network. Specifically, we show that by using our design N elements can be sorted in /spl Theta/(N/p log N/p) time without memory access conflicts. We also show how to use an AT/sup 2/-optimal sorting network of fixed I/O size p to construct a similar systolic architecture that sorts N elements in /spl Theta/(N/p log N/plogp) time.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134576578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A simulator construction methodology for the Shiva multiprocessor system","authors":"S. Slomka, K. Sterzl, V. Lakshmi Narasimhan","doi":"10.1109/ICAPP.1997.651490","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651490","url":null,"abstract":"This paper describes a simulator for the Shiva multiprocessor system and the simulator construction methodology (SCM) used in its creation. The SCM, based on the active functional unit (AFU) construct, is a modern SCM which is flexible, accurate, fast, easy to use, capable of dynamic reconfigurability at run-time, and most of all simple and capable of quick simulator construction. The AFU SCM is capable of all these things through the use of object-oriented software techniques. The Shiva simulator constructed using the AFU SCM is program-driven and capable of micro and macro architectural simulation.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133565790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Network enabled solvers for scientific computing using the NetSolve system","authors":"H. Casanova, J. Dongarra","doi":"10.1109/ICAPP.1997.651477","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651477","url":null,"abstract":"Agent-based computing is increasingly regarded as an elegant and efficient way of providing access to computational resources. Several metacomputing research projects are using intelligent agents to manage a resource space and to map user computation to these resources in an optimal fashion. Such a project is NetSolve, developed at the University of Tennessee and Oak Ridge National Laboratory. NetSolve provides the user with a variety of interfaces that afford direct access to preinstalled, freely available numerical libraries. These libraries are embedded in computational servers. New numerical functionalities can be integrated easily into the servers by a specific framework. The NetSolve agent manages the coherency of the computational servers. It also uses predictions about the network and processor performances to assign user requests to the most suitable servers. This article reviews some of the basic concepts in agent-based design, discusses the NetSolve project and how its agent enhances flexibility and performance, and provides examples of other research efforts. Also discussed are future directions in agent-based computing in general and in NetSolve in particular.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133708540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}