{"title":"A Novel Routing Algorithm for k-Ary n-Cube Interconnection Networks","authors":"E. Demaine, S. Sampalli","doi":"10.1142/S0129053396000070","DOIUrl":"https://doi.org/10.1142/S0129053396000070","url":null,"abstract":"This paper proposes a novel routing algorithm, called the direction-first e-cube, for routing on k-ary n-cube interconnection networks. It is an adaptive, partially minimal algorithm based on the wormhole-routing strategy and effectively extends the basic e-cube technique. It has been proved by a set of theorems that the proposed algorithm is deadlock-, livelock- and starvation-free. In the absence of faults, the algorithm is fully minimal. Even in the presence of faults and network congestion, the number of extra hops required to route a message is minimal. The algorithm is also simple to implement, since it utilizes a small header node of 2 log2 n+n log2k+1 bits. Simulation results are given to validate the proposed algorithm.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115573287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generalized Rotate Sort on Mesh-Connected Computers with Multiple Broadcasting Using Fewer Processors","authors":"Chin-Fu Lin, S. Horng, T. Kao","doi":"10.1142/S0129053395000282","DOIUrl":"https://doi.org/10.1142/S0129053395000282","url":null,"abstract":"In this paper, we first present an O(log n) time sorting algorithm on 3-D mesh-connected computers with multiple broadcasting (abbreviated to MCCMB) using n1/2×n1/2×n1/2 processors. Our algorithm is derived from rotate sort. Further, we also show that the result can be extended to k-dimensional MCCMB of size to sort n data items in O(7k−3 log n) time, for k≥3. The algorithm proposed is optimal speed-up while k is any constant. The contribution of this paper is to show that the proposed algorithm can be run in a higher dimensional MCCMB and using fewer processors but keeps the same time complexity as O(log n).","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116488498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Topological Feature Maps on Parallel Computers","authors":"Tao Li, L. Tao","doi":"10.1142/S0129053395000294","DOIUrl":"https://doi.org/10.1142/S0129053395000294","url":null,"abstract":"Analysis of skewed memory schemes for parallel implementation of Kohonen’s topological feature maps are presented in this paper. It is found that linear skewing is more general and more effective for this problem as compared with other memory skewing schemes. Implementations on parallel machines and performance results will also be presented.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133396488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Case Study of Parallel Processing: Informatics vs Scientific Computing","authors":"Stephan Waser, H. Burkhart","doi":"10.1142/S0129053395000312","DOIUrl":"https://doi.org/10.1142/S0129053395000312","url":null,"abstract":"Informatics and Scientific Computing approach parallel processing in a different way. We briefly describe the different points of view of both camps. Next we concentrate on a case study in the area of scientific computing. The problem chosen is from Physical Chemistry (self-consistent field computation). We describe the problem, the sequential solution, the parallelization strategy and present the performance values we have achieved. Our implementation is based on a 60-node transputer system, available at the Parallel Processing Laboratory in Basel.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133126590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Image Processing Applications on the MasPAR Massively Parallel Computers","authors":"M. Hamdi, Y. Pan, K. W. Tong","doi":"10.1142/S0129053395000270","DOIUrl":"https://doi.org/10.1142/S0129053395000270","url":null,"abstract":"Image processing applications are suitable candidates for parallelism and have at least in part motivated the design and development of some of the pioneering massively parallel processing systems including the CLIP family, the DAP, the MPP and the GAPP. In this paper, we describe the implementation of various image processing algorithms on the MasPar massively parallel computer system. The suitability of the MasPar for solving image processing algorithms is demonstrated either by parallelizing the algorithms using successful known techniques and/or developing new techniques suitable for the MasPar architecture. We quantitatively evaluate the performance of MasPar in solving these problems. Then, we compare its performance to various related massively parallel architectures. It is shown that the MasPar system compares favorably to these architectures, and is able to execute many fundamental image processing applications in a time amenable to real-time processing. Thus, the MasPar seems to be a promising architecture for massively parallel real-time image processing applications.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116729986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Processor Allocation in Extended Hypercube Multiprocessor","authors":"Sumeet Ahuja, A. Sarje","doi":"10.1142/S0129053395000269","DOIUrl":"https://doi.org/10.1142/S0129053395000269","url":null,"abstract":"Processor allocation is an important issue for efficient utilization of a multiprocessor system. Several well-known methods exist for processor allocation in the hypercube multiprocessor. Some disadvantages of the hypercube structure have been overcome in the Extended Hypercube. The paper attempts at showing how the processor allocation schemes of hypercube can be extended to the Extended Hypercube structure. Results of a simulation study are also provided.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131040091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Work and Memory-Efficient Parallel Algorithms for the Knapsack Problem","authors":"Afonso Ferreira","doi":"10.1142/S0129053395000324","DOIUrl":"https://doi.org/10.1142/S0129053395000324","url":null,"abstract":"Parallel algorithms for solving a knapsack problem of size n on PRAM and distributed memory machines are presented. The algorithms are work-efficient in the sense that they achieve optimal speedup with regard to the best known solution to this problem. Moreover, they match the best current time/memory/processors tradeoffs, while requiring less memory and/or processors. Since the PRAM is considered mainly as a theoretical model, and we want to produce practical algorithms for the knapsack problem, its solution in distributed memory machines is also studied. For the first time in literature, work-efficient parallel algorithms on local memory — message passing architectures — are given. Time bounds for solving the problem on linear arrays, meshes, and hypercubes are proved.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134099157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fault Tolerant Systolic Evaluation of Polynomials and Exponentials of Polynomials for equispaced Arguments Using Time Redundancy","authors":"M. Vijay","doi":"10.1142/S0129053395000191","DOIUrl":"https://doi.org/10.1142/S0129053395000191","url":null,"abstract":"Many applications which require high speed evaluation of polynomials and exponentials of polynomials can now be implemented in the hardware very efficiently because of the advances in VLSI technology. Several fast algorithms have been proposed in the recent past for the efficient evaluation of polynomials and exponentials of polynomials for equispaced arguments on uniprocessor systems. In this paper, we consider the problem of organizing this evaluation on VLSI chips in the form of systolic arrays. We present linear fault tolerant systolic arrays which can evaluate the polynomials and exponentials of polynomials of any degree for a large number of equispaced points. These organizations have the main advantage that the interconnections between the processing elements are very regular and simple, and hence are very appropriate for VLSI implementation.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121733306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of a Processor Element for a High Performance Massively Parallel SIMD System","authors":"D. Beal, C. Lambrinoudakis","doi":"10.1142/S0129053395000208","DOIUrl":"https://doi.org/10.1142/S0129053395000208","url":null,"abstract":"This paper describes the architecture of the General Purpose with Floating Point support (GPFP) processing element, which uses the expansion of circuitry from VLSI advances to provide on-chip memory and cost-effective extra functionality. A major goal was to accelerate floating point arithmetic. This was combined with architectural aims of cost-effectiveness, achieving the floating-point capability from general-purpose units, and retaining the 1-bit manipulations available in the earlier generation. With a 50 MHz clock each PE is capable of 2.5 MegaFlops. Normalized to the same clock rate, the GPFP PE exceeds first generation PEs by far, namely the DAP by a factor of 50 and the MPP by a factor of 20, and also outperforms the recent MasPar design by a factor of four. A 32×32 GPFP array is capable of up to 2.5 GigaFlops and 6500 MIPS, on 32-bit additions. These speedups are obtained by architectural features rather than increased width of data-handling and are combined with parsimonious use of circuitry compatible with massively parallel fabrication. The GPFP also incorporates Reconfigurable Local Control (RLC), a technique that combines a considerable degree of local autonomy within PEs and microcode flexibility, giving the machine improved general-purpose programmability in addition to floating-point numerical performance.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124243645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Scalettar, K. Runge, J. Correa, P. Lee, V. Oklobdzija, J. Vujic
{"title":"Simulations of Interacting Many Body Systems Using P4","authors":"R. Scalettar, K. Runge, J. Correa, P. Lee, V. Oklobdzija, J. Vujic","doi":"10.1142/S012905339500018X","DOIUrl":"https://doi.org/10.1142/S012905339500018X","url":null,"abstract":"Monte Carlo (MC) and Molecular Dynamics (MD) simulations are powerful tools for understanding the low temperature properties of systems of interacting electrons and phonons in a solid, including the phenomena of magnetism and superconductivity. When mobile electrons are studied, these simulations are currently limited to a few hundred particles, and also largely to “clean” systems where no defects are present. Therefore, more powerful machines and algorithms must be used to address many of the most important issues in the field. In this paper, we present results from using some simple implementations of the p4 parallel programming system on a variety of parallel architectures to conduct MC and MD simulations of one and two dimensional electron-phonon models.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128029307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}