{"title":"A fast parallel sorting algorithm on the k-dimensional reconfigurable mesh","authors":"Ju-wook Jang, Kichul Kim","doi":"10.1109/ICAPP.1997.651519","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651519","url":null,"abstract":"We presents a new parallel sorting algorithm on the k-dimensional reconfigurable mesh which is a generalized version of the well-studied (two dimensional) reconfigurable mesh. We introduce a new mapping technique which combines the enlarged bandwidth of the multidimensional mesh and the feature of the reconfigurable mesh. Using our mapping technique, we show that N/sup k/ numbers can be sorted in O(4/sup k/) (constant time for small k) time on a k+1 dimensional reconfigurable mesh of size k+1 times N/spl times/N/spl times/.../spl times/N. In addition, it is shown that the number of 1's in a 0/1 array of k times size N/spl times/N/spl times/.../spl times/N can be computed in O(log* N+log k) time on reconfigurable k times mesh of size N/spl times/N/spl times/.../spl times/N.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126699787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallelization of the H.261 video coding algorithm on the IBM SP2(R) multiprocessor system","authors":"N. Yung, K. Leung","doi":"10.1109/ICAPP.1997.651523","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651523","url":null,"abstract":"In this paper, the parallelization of the H.261 video coding algorithm on the IBM SP2 multiprocessor system is described. Based on domain decomposition as a framework, data partitioning, data dependencies and communication issues are carefully assessed. From these, two parallel algorithms were developed. The first one maximizes processor utilization and the second one minimizes communications. Our analysis shows that the first algorithm exhibits poor scalability and high communication overhead; and the second algorithm exhibits good scalability and low communication overhead. A best median speed up of 13.72 or 11 frames/sec was achieved on 24 processors.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123367540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallelization of IP-packet filter rules","authors":"Takeshi Miei, M. Maruyama, T. Ogura, N. Takahashi","doi":"10.1109/ICAPP.1997.651506","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651506","url":null,"abstract":"A compiler for parallelizing IP-packet filter rules is presented which will improve network security and reduce packet-forwarding performance degradation. It analyzes the interdependence of packet-filtering rules specified by a network administrator and translates them into an intermediate program whose instructions can be executed in parallel. Three types of compiler operations are introduced: division is used to divide the rules into parallel expressions, simplification is used to simplify redundant rules, deletion is used to delete infeasible rules.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"400 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122856083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient run-time scheduling for parallelizing partially parallel loops","authors":"Tsung-Chuan Huang, Po-Hsueh Hsu, Tze-Nan Sheng","doi":"10.1109/ICAPP.1997.651508","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651508","url":null,"abstract":"We propose an efficient run-time technique to find an optimal parallel execution schedule for partially parallel loops in which synchronizations between iterations are needed to ensure correct program semantics. For efficiency, we combine conventional mark phase and scheduler phase into a single parallel scheduler. The scheduler divides the loop iterations into several chunks then executes the iterations in one chunk in parallel. Our scheme not only runs fast but also achieves an optimal schedule. In addition, an atomic bit-vector operation is introduced to avoid global synchronization overhead and ensure the larger wavefront number is kept when the wavefront number of an iteration will be concurrently updated during scheduling.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134458903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiple dependent queries execution using critical path scheduling in parallel databases","authors":"K.H. Liu, C. Leung, Y. Jiang","doi":"10.1109/ICAPP.1997.651534","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651534","url":null,"abstract":"Multiple processors are employed to improve the performance of database systems and the parallelism can be exploited at three levels in query processing: intra-operation, inter-operation, and inter-query parallelism. Intra-operation and inter-operation parallelism are also called intra-query parallelism which has been studied extensively. In contrast, inter-query parallelism has received little attention particularly for multiple dependent queries. We develop a decompression algorithm, CPS, for coping with multiple dependent queries which are represented by a directed graph, and the algorithm makes use of the activity analysis of critical path analysis, and the resource scheduling and levelling of project management. A simulation study has been conducted and the results show that the proposed algorithm outperforms other existing methods and is able to provide a global optimal solution when the number of processors available is sufficient.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132468532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A fibre channel-based architecture for Internet multimedia server clusters","authors":"Shenze Chen, M. Thapar","doi":"10.1109/ICAPP.1997.651512","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651512","url":null,"abstract":"In this paper, we present a cluster architecture for Internet multimedia servers, which uses the Fibre Channel (FC) technology to overcome some of the shortcomings of existing architectures. We also explore the design issues of an FC-based multimedia server cluster. A significant advantage of the FC-based cluster is that it allows physical storage attachment to the interconnect. Because of this feature, FC-based clusters will change the fundamental data-sharing paradigm of existing clusters by eliminating remote data accesses in a cluster. Many aspects of this architecture are critical to real-time multimedia applications, such as audio and video services.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114803689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Petriu, A. Guergachi, G. Patry, L. Zhao, D. Petriu, G. Vukovich
{"title":"Artificial neural architecture for real time modelling applications","authors":"E. Petriu, A. Guergachi, G. Patry, L. Zhao, D. Petriu, G. Vukovich","doi":"10.1109/ICAPP.1997.651529","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651529","url":null,"abstract":"This paper presents the random-pulse machine concept and shows how it can be used for the modular design of artificial neural networks. Random-pulse machines deal with analog variables represented by the mean rate of random-pulse streams and use simple digital technology to perform arithmetic and logic operations. As an application example, a NN is proposed for modeling of the activated sludge wastewater treatment plants.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116420418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Wittenburg, M. Ohmacht, J. Kneip, W. Hinrichs, P. Pirsch
{"title":"HiPAR-DSP: a parallel VLIW RISC processor for real time image processing applications","authors":"J. Wittenburg, M. Ohmacht, J. Kneip, W. Hinrichs, P. Pirsch","doi":"10.1109/ICAPP.1997.651487","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651487","url":null,"abstract":"Derived from a thorough analysis of a wide class of image processing algorithms' properties, a parallel RISC architecture has been developed. The architecture gains performance from data level parallelism as well as from instruction level parallelism. From the beginning of the concept phase, high-level programming capabilities have been one of the major design goals. Thus, there has been a steady interaction between the design of the software development toolkit-optimizing assembler and C++ compiler-and the architecture itself. The RISC-typical register files are one of the most critical elements as well concerning die size and clock frequency as the assembler's ability in VLIW scheduling. Running at 100 MHz (200 mm/sup 2/, 0.35 /spl mu/m CMOS) the processor reaches a sustained performance of more than 2 GOPS for a wide range of image processing algorithms.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122737470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Shadow Stacks-a hardware-supported DSM for objects of any granularity","authors":"S. Groh, M. Pizka, J. Rudolph","doi":"10.1109/ICAPP.1997.651493","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651493","url":null,"abstract":"This paper presents a new Distributed Shared Memory (DSM) management concept that is integrated into a scalable distributed virtual memory management technique and circumvents false sharing while still preserving simplicity to the application level. Objects defined as usual by variables in the declaration part of functions are made sharable among threads executing in the distributed environment. These objects of varying granularity and with different consistency requirements are managed separately to avoid false sharing. Consistency is enforced at runtime by a distributed manager-agent architecture, that supports automatic and dynamic selection of an adequate coherence protocol per object. To provide efficiency, the implementation of the Shadow Stacks concept is based on the exploitation of the page fault mechanism provided by of the shelf hardware.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128578821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Virtual parallel processors","authors":"C. Dick, F. Harris","doi":"10.1109/ICAPP.1997.651485","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651485","url":null,"abstract":"The introduction of SRAM-based field programmable gate arrays (FPGAs) has opened-up a new dimension to parallel computing architectures. This paper describes an alternative approach to parallel computing-reconfigurable or virtual parallel processing (VPP). Rather than mapping an application onto a given parallel machine, the VPP approach synthesizes the appropriate type and number of processing elements, as well as the interconnection topology, that is optimal for the application. For each application, configuration data is downloaded to the machine that personalizes the hardware for the task at hand. The paper provides a brief description of the authors reconfigurable computer, Archimedes. The benefits of the VPP approach are highlighted by an example application-the 2-D FFT. A novel parallel implementation of a polynomial transform based 2-D transform is described and compared to results for distributed memory parallel machines that have been reported in the literature. The comparison highlights the computational advantage provided by reconfigurable computing.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129402902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}