{"title":"Fast parallel algorithms for computing trigonometric sums","authors":"Przemysław Stpiczyński","doi":"10.1109/PCEE.2002.1115276","DOIUrl":"https://doi.org/10.1109/PCEE.2002.1115276","url":null,"abstract":"In this paper we present new parallel versions of sequential Goertzel and Reinsch algorithms for calculating trigonometric sums. The new algorithms use a recently introduced, very efficient BLAS-based algorithm for solving linear recurrence systems with constant coefficients. To achieve their portability across different shared-memory parallel architectures, the algorithms have been implemented in Fortran 77 and OpenMP We also present experimental results performed on a two processor Pentium III computer running under Linux operating system with Atlas as an efficient implementation of BLAS. The new algorithms are up to 60-90% faster than the equivalent sequential Goertzel and Reinsch algorithms, even on one processor.","PeriodicalId":444003,"journal":{"name":"Proceedings. International Conference on Parallel Computing in Electrical Engineering","volume":"292 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116873811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic algorithm transforms for reconfigurable real-time audio coding processor","authors":"A. Petrovsky, A. Petrovsky","doi":"10.1109/PCEE.2002.1115317","DOIUrl":"https://doi.org/10.1109/PCEE.2002.1115317","url":null,"abstract":"In this paper, dynamic algorithm transforms (DAT) for reconfigurable real-time processor for audio coding based on the adaptive wavelet packet decomposition are presented. The DAT technique is used to constrain a minimum cost subband decomposition of wavelet transform by maximizing the minimum masking threshold (which is limited by the perceptual entropy) in every subband for the given embedded processor architecture and temporal resolution.","PeriodicalId":444003,"journal":{"name":"Proceedings. International Conference on Parallel Computing in Electrical Engineering","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127338183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Allocation strategies of user requests in Web server clusters","authors":"H. Krawczyk, Arkadiusz Urbaniak","doi":"10.1109/PCEE.2002.1115244","DOIUrl":"https://doi.org/10.1109/PCEE.2002.1115244","url":null,"abstract":"Four architectures for improving efficiency of Web servers on the base of cluster computing are presented. Three categories of load balancing strategies are distinguished: DNS, dispatcher and server oriented ones. The PVM simulator is implemented to define and model such strategies.","PeriodicalId":444003,"journal":{"name":"Proceedings. International Conference on Parallel Computing in Electrical Engineering","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131305893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Kasahara, M. Obata, K. Ishizaka, K. Kimura, H. Kaminaga, H. Nakano, Kouhei Nagasawa, Akiko Murai, H. Itagaki, J. Shirako
{"title":"Multigrain automatic parallelization in Japanese Millennium Project IT21. Advanced Parallelizing Compiler","authors":"H. Kasahara, M. Obata, K. Ishizaka, K. Kimura, H. Kaminaga, H. Nakano, Kouhei Nagasawa, Akiko Murai, H. Itagaki, J. Shirako","doi":"10.1109/PCEE.2002.1115213","DOIUrl":"https://doi.org/10.1109/PCEE.2002.1115213","url":null,"abstract":"This paper describes OSCAR multigrain parallelizing compiler which has been developed in Japanese Millennium Project IT21 \"Advanced Parallelizing Compiler\" project and its performance on SMP machines. The compiler realizes multigrain parallelization for chip-multiprocessors to high-end servers. It hierarchically exploits coarse grain task parallelism among loops, subroutines and basic blocks and near fine grain parallelism among statements inside a basic block in addition to loop parallelism. Also, it globally optimizes cache use over different loops, or coarse grain tasks, based on the data localization technique to reduce memory access overhead. Current performance of OSCAR compiler for SPEC95fp is evaluated on different SMPs. For example, it gives us 3.7 times speedup for HYDRO2D, 1.8 times for SWIM, 1.7 times for SU2COR, 2.0 times for MGRID, 3.3 times for TURB3D on 8 processor IBM RS6000, against XL Fortran compiler ver 7.1 and 4.2 times speedup for SWIM and 2.2 times speedup for TURB3D on 4 processor Sun Ultra80 workstation against Forte6 update 2.","PeriodicalId":444003,"journal":{"name":"Proceedings. International Conference on Parallel Computing in Electrical Engineering","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123036281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new version of conjugate gradient method parallel implementation","authors":"R. Bycul, A. Jordan, M. Cichomski","doi":"10.1109/PCEE.2002.1115282","DOIUrl":"https://doi.org/10.1109/PCEE.2002.1115282","url":null,"abstract":"In the article the authors describe an idea of parallel implementation of a conjugate gradient method in a heterogeneous PC cluster and a supercomputer Hitachi SR-2201. The new version of algorithm implementation differs from the one applied earlier (Jordan and Bycul, 2002), because it uses a special method for storing sparse coefficient matrices: only non-zero elements are stored and taken into account during computations, so that the sparsity of the coefficient matrix is taken full advantage of. The article includes a comparison of the two versions. A speedup of the parallel algorithm has been examined for three different cases of coefficient matrices resulting in solving different physical problems. The authors have also investigated a preconditioning method, which uses the inversed diagonal of the coefficient matrix, as a preconditioning matrix.","PeriodicalId":444003,"journal":{"name":"Proceedings. International Conference on Parallel Computing in Electrical Engineering","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114605977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adding advanced synchronization to processes in GRADE","authors":"J. Borkowski, D. Kopanski, M. Tudruj","doi":"10.1109/PCEE.2002.1115222","DOIUrl":"https://doi.org/10.1109/PCEE.2002.1115222","url":null,"abstract":"New synchronization mechanisms using asynchronous computation activation and cancellation, based on state monitoring of a parallel application, are presented. The paper proposes how to integrate the mechanisms with the GRADE graphical parallel program design environment. Necessary enhancements to the GUI along with the semantics and possible applications are presented. Efficient implementation methods for the proposed synchronization are discussed.","PeriodicalId":444003,"journal":{"name":"Proceedings. International Conference on Parallel Computing in Electrical Engineering","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116052531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Diagnostics of a selected AC drive using parallel processing","authors":"W. Juszczyk, Z. Kich, M. Zajac","doi":"10.1109/PCEE.2002.1115301","DOIUrl":"https://doi.org/10.1109/PCEE.2002.1115301","url":null,"abstract":"It is assumed in this paper that the aim of power electronic converter diagnostics is to find out the actual converter's state and to automatically formulate a diagnosis to correct it. The result of the diagnostics' operation was either confirmation of the converter's correctness or detection and isolation of faults. To formulate the diagnosis the paper uses fast orthogonal transforms. A direct frequency converter is considered as an example in which the diagnostics was based on using an algorithm of a fast Haar transform. Control of the converter and its diagnostics were set up by employing parallel processing, which enabled real-time requirements to be met.","PeriodicalId":444003,"journal":{"name":"Proceedings. International Conference on Parallel Computing in Electrical Engineering","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127457952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Configurable microcontroller array","authors":"O. Maslennikov, Juri Shevtshenko, A. Sergyienko","doi":"10.1109/PCEE.2002.1115196","DOIUrl":"https://doi.org/10.1109/PCEE.2002.1115196","url":null,"abstract":"In this paper, the configurable microcontroller array based on the i8051 processor unit (PU) architecture is proposed. The use of the well-known PU architecture simplifies the application programming. The designed microcontroller PU core has in 6 times higher instruction implementation speed, and in more than 2.5 times clock frequency than the original microcontroller. The proposed technique of mapping the program into configurable hardware showed the 1.5-2-fold hardware minimization. It shows an effective way to speedup the implementation of both computing and control intensive algorithms. Proposed/array is very useful in such applications, where logic intensive calculations, or high speed byte handling computations are of demand. For example, such applications are homomorphic image processing, pattern recognition, genetic algorithms, neural nets, etc.","PeriodicalId":444003,"journal":{"name":"Proceedings. International Conference on Parallel Computing in Electrical Engineering","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128626345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Solving the flow shop problem by parallel tabu search","authors":"W. Bożejko, M. Wodecki","doi":"10.1109/PCEE.2002.1115237","DOIUrl":"https://doi.org/10.1109/PCEE.2002.1115237","url":null,"abstract":"We present a parallel tabu search algorithm for the permutation flow shop sequencing problem with the objective of minimizing the flowtime. We propose a neighbourhood using so-called blocks of jobs on a critical path and a backtrace jump method. By computer simulations it is shown that the performance of the proposed algorithm is comparable with the random heuristic technique discussed in literature. Another interesting property is the fact that the speedup of parallel implementation is equal or even greater than p, where p is the number of processors.","PeriodicalId":444003,"journal":{"name":"Proceedings. International Conference on Parallel Computing in Electrical Engineering","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133317832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Experimental checking of fault susceptibility in a parallel algorithm","authors":"A. Derezińska, J. Sosnowski","doi":"10.1109/PCEE.2002.1115193","DOIUrl":"https://doi.org/10.1109/PCEE.2002.1115193","url":null,"abstract":"We deal with the problem of analyzing fault susceptibility of a parallel algorithm designed for a multiprocessor array (MIMD structure). This algorithm realizes quite a complex communication protocol in the system. We present an original methodology of the analysis based on the use of a software implemented fault injector. The considered algorithm is modeled as a multithreaded application. The experiment set up and results are presented and commented. The performed experiments proved relatively high natural robustness of the analyzed algorithm and showed further possibilities of its improvement.","PeriodicalId":444003,"journal":{"name":"Proceedings. International Conference on Parallel Computing in Electrical Engineering","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133944240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}