J. P. B. Angeli, A. Valli, N. C. Reis, A. D. Souza
{"title":"Finite difference simulations of the Navier-Stokes equations using parallel distributed computing","authors":"J. P. B. Angeli, A. Valli, N. C. Reis, A. D. Souza","doi":"10.1109/CAHPC.2003.1250333","DOIUrl":"https://doi.org/10.1109/CAHPC.2003.1250333","url":null,"abstract":"We discuss the implementation of a numerical algorithm for simulating incompressible fluid flows based on the finite difference method and designed for parallel computing platforms with distributed-memory, particularly for clusters of workstations. The solution algorithm for the Navier-Stokes equations utilizes an explicit scheme for pressure and an implicit scheme for velocities, i. e., the velocity field at a new time step can be computed once the corresponding pressure is known. The parallel implementation is based on domain decomposition, where the original calculation domain is decomposed into several blocks, each of which given to a separate processing node. All nodes then execute computations in parallel, each node on its associated subdomain. The parallel computations include initialization, coefficient generation, linear solution on the subdomain, and inter-node communication. The exchange of information across the subdomains, or processors, is achieved using the message passing interface standard, MPI. The use of MPI ensures portability across different computing platforms ranging from massively parallel machines to clusters of workstations. The execution time and speed-up are evaluated through comparing the performance of different numbers of processors. The results indicate that the parallel code can significantly improve prediction capability and efficiency for large-scale simulations.","PeriodicalId":433002,"journal":{"name":"Proceedings. 15th Symposium on Computer Architecture and High Performance Computing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128915761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast parallel FFT on a reconfigurable computation platform","authors":"A. Kamalizad, Chengzhi Pan, N. Bagherzadeh","doi":"10.1109/CAHPC.2003.1250345","DOIUrl":"https://doi.org/10.1109/CAHPC.2003.1250345","url":null,"abstract":"We present implementation of a very fast parallel complex FFT on M2, the second generation of MorphoSys reconfigurable computation platform, which is targeting on streamed applications such as multimedia and DSP. The proposed mapping comprises fast presorting, cascaded radix-2 stages, and postreordering. Data and twiddle factors are 16-bit real and 16-bit imaginary in 2's complement format and scaling is performed to avoid overflow. The mapping is tested on our cycle-accurate simulator, \"mulate\", and the performance is encouragingly better than other architectures such as Imagine and VIRAM. Moreover, the performance is scalable according to FFT sizes. Since there is no functionality specifically tailored to FFT, the results demonstrate the capability of MorphoSys architecture to extract parallelism from streamed applications. Further rationales are given based on the concepts of scalar operand networks and memory hierarchy.","PeriodicalId":433002,"journal":{"name":"Proceedings. 15th Symposium on Computer Architecture and High Performance Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123913337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gianluca Varenni, M. Baldi, Loris Degioanni, Fulvio Risso
{"title":"Optimizing packet capture on symmetric multiprocessing machines","authors":"Gianluca Varenni, M. Baldi, Loris Degioanni, Fulvio Risso","doi":"10.1109/CAHPC.2003.1250328","DOIUrl":"https://doi.org/10.1109/CAHPC.2003.1250328","url":null,"abstract":"Traffic monitoring and analysis based on general purpose systems with high speed interfaces, such as Gigabit Ethernet and 10 Gigabit Ethernet, requires carefully designed software in order to achieve the needed performance. One approach to attain such a performance relies on deploying multiple processors. This work analyses some general issues in multiprocessor systems that are particularly critical in the context of packet capture and network monitoring applications. More important, a new algorithm is proposed to coordinate multiple producers concurrently accessing a shared buffer, which is instrumental in packet capture on symmetrical multiprocessor machines.","PeriodicalId":433002,"journal":{"name":"Proceedings. 15th Symposium on Computer Architecture and High Performance Computing","volume":"2011 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125633076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francisco Heron de Carvalho Junior, R. Lins, N. Quental
{"title":"On the implementation of SPMD applications using Haskell/sub #/","authors":"Francisco Heron de Carvalho Junior, R. Lins, N. Quental","doi":"10.1109/CAHPC.2003.1250321","DOIUrl":"https://doi.org/10.1109/CAHPC.2003.1250321","url":null,"abstract":"Commodities-built clusters, a low cost alternative for distributed parallel processing, brought high-performance computing to a wide range of users. However, the existing widespread tools for distributed parallel programming, such as messaging passing libraries, does not attend new software engineering requirements that have emerged due to increase in complexity of applications. Haskell/sub #/ is a parallel programming language intending to reconcile higher abstraction and modularity with scalable performance. It is demonstrated the use of Haskell/sub #/ in the programming of three SPMD benchmark programs, which have lower-level MPI implementations available.","PeriodicalId":433002,"journal":{"name":"Proceedings. 15th Symposium on Computer Architecture and High Performance Computing","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131668277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rafael Ennes Silva, Delcino Picinin, Marcos E. Barreto, R. Ávila, T. A. Diverio, P. Navaux
{"title":"Performance analysis of DECK collective communication service","authors":"Rafael Ennes Silva, Delcino Picinin, Marcos E. Barreto, R. Ávila, T. A. Diverio, P. Navaux","doi":"10.1109/CAHPC.2003.1250322","DOIUrl":"https://doi.org/10.1109/CAHPC.2003.1250322","url":null,"abstract":"Collective communication is very useful for parallel applications, especially those in which matrix and vector data structures need to be manipulated by a group of processes. We present a performance analysis of collective communication primitives designed for the DECK parallel programming environment, with the aid of different numerical methods used to solve hydrodynamics and mass transportation models.","PeriodicalId":433002,"journal":{"name":"Proceedings. 15th Symposium on Computer Architecture and High Performance Computing","volume":"189 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116333451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paulo Vicente Capellotto Costa, S. Zorzo, H. Guardia
{"title":"ProGrid: a proxy-based architecture for grid operation and management","authors":"Paulo Vicente Capellotto Costa, S. Zorzo, H. Guardia","doi":"10.1109/CAHPC.2003.1250327","DOIUrl":"https://doi.org/10.1109/CAHPC.2003.1250327","url":null,"abstract":"We introduce the ProGrid system, an architecture for computational grids, whose communication and resource management infrastructure is used transparently by the applications. Unlike other grid approaches, either application-centric or system-centric, the model relies on the use of proxy servers to perform additional communications and authentication procedures on behalf of client applications. The purpose of this mechanism is to enable parallel applications to be executed in geographically distributed environments interlinked by an open communication network, such as the Internet, meeting the security requisites desirable for computational grids. Among the common services of a grid, we focus on safe communication and the controlled sharing of available resources. To identify the resources, standards under development are considered for the specification of objects in grids. We also discuss an extension of the functionality of proxy servers to include support for the standardized management of the grid and of the available objects.","PeriodicalId":433002,"journal":{"name":"Proceedings. 15th Symposium on Computer Architecture and High Performance Computing","volume":"796 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123909031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cristina Boeres, Alexandre A. B. Lima, Vinod E. F. Rebello
{"title":"Hybrid task scheduling: integrating static and dynamic heuristics","authors":"Cristina Boeres, Alexandre A. B. Lima, Vinod E. F. Rebello","doi":"10.1109/CAHPC.2003.1250339","DOIUrl":"https://doi.org/10.1109/CAHPC.2003.1250339","url":null,"abstract":"Researchers are constantly looking for ways to improve the execution time of parallel applications on distributed systems. Although compile-time static scheduling heuristics employ complex mechanisms, the quality of their schedules are handicapped by estimated run-time costs. On the other hand, while dynamic schedulers use actual run-time costs, they have to be of low complexity in order to reduce the scheduling overhead. We investigate the viability of integrating these two approaches into a hybrid scheduling framework. The relationship between static schedulers, dynamic heuristics and scheduling events are examined. The results show that a hybrid scheduler can indeed improve the schedules produced by good traditional static list scheduling algorithms.","PeriodicalId":433002,"journal":{"name":"Proceedings. 15th Symposium on Computer Architecture and High Performance Computing","volume":"00 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123977931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Almási, Leonardo R. Bachega, S. Chatterjee, Manish Gupta, D. Lieber, X. Martorell, J. Moreira
{"title":"Enabling dual-core mode in BlueGene/L: challenges and solutions","authors":"G. Almási, Leonardo R. Bachega, S. Chatterjee, Manish Gupta, D. Lieber, X. Martorell, J. Moreira","doi":"10.1109/CAHPC.2003.1250317","DOIUrl":"https://doi.org/10.1109/CAHPC.2003.1250317","url":null,"abstract":"BlueGene/L is a massively parallel computer system with 65536 dual-processor compute nodes. The peak performance of BlueGene/L is in excess of 360 TFLOP/s if both processor cores in a node are used for computation. The main challenge of deploying this dual-core mode of operation is that the L1 caches in each core are not hardware coherent. This forces a software-based approach to cache coherence and guides our design of a programming model for dual-core mode. We describe the design, implementation, and performance evaluation of system software for enabling the use of dual-core mode on BlueGene/L. Our preliminary performance results show that our approach to dual-core mode is effective for key numerical kernels.","PeriodicalId":433002,"journal":{"name":"Proceedings. 15th Symposium on Computer Architecture and High Performance Computing","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129919022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Pereira, P. Vargas, F. França, M. D. Castro, I. Dutra
{"title":"Applying scheduling by edge reversal to constraint partitioning","authors":"M. Pereira, P. Vargas, F. França, M. D. Castro, I. Dutra","doi":"10.1109/CAHPC.2003.1250331","DOIUrl":"https://doi.org/10.1109/CAHPC.2003.1250331","url":null,"abstract":"Scheduling by edge reversal (SER) is a fully distributed scheduling mechanism based on the manipulation of acyclic orientations of a graph. This work uses SER to perform constraint partitioning of constraint satisfaction problems (CSP). In order to apply the SER mechanism, the graph representing the constraints must receive an acyclic orientation. Since obtaining an optimal acyclic orientation is an NP-hard problem, we study three nondeterministic strategies known in the literature: Alg-Neigh, Alg-Edges, and Alg-Colour. We implemented the three algorithms and the SER scheduling mechanism, applying them to the CSP constraint networks generated from 3 applications. Our results show that SER has a great potential to perform a good partitioning of the constraint graphs.","PeriodicalId":433002,"journal":{"name":"Proceedings. 15th Symposium on Computer Architecture and High Performance Computing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125661579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"X4CP32: a new parallel/reconfigurable general-purpose processor","authors":"R. Soares, A. Azevedo, Ivan Saraiva Silva","doi":"10.1109/CAHPC.2003.1250346","DOIUrl":"https://doi.org/10.1109/CAHPC.2003.1250346","url":null,"abstract":"The X4CP32 is a parallel/reconfigurable microprocessor with 2 programming levels. Although it is a general-purpose microprocessor, it has the reliable performance of a reconfigurable architecture. We expose its architecture and programming levels, and discuss the powerful interaction between parallel programming and reconfiguration. It shows two performance-optimized implementations of matrix multiplication using both parallel and reconfigurable paradigms and a parallel implementation of miner intelligent agents.","PeriodicalId":433002,"journal":{"name":"Proceedings. 15th Symposium on Computer Architecture and High Performance Computing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123722514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}