{"title":"A Parallel High-Order Fictitious Domain Approach for Biomechanical Applications","authors":"M. Ruess, V. Varduhn, E. Rank, Z. Yosibash","doi":"10.1109/ISPDC.2012.45","DOIUrl":"https://doi.org/10.1109/ISPDC.2012.45","url":null,"abstract":"The focus of this contribution is on the parallelization of the Finite Cell Method (FCM) applied for biomechanical simulations of human femur bones. The FCM is a high-order fictitious domain method that combines the simplicity of Cartesian grids with the beneficial properties of hierarchical approximation bases of higher order for an increased accuracy and reliablility of the simulation model. A pre-computation scheme for the numerically expensive parts of the finite cell model is presented that shifts a significant part of the analysis update to a setup phase of the simulation, thus increasing the update rate of linear analyses with time-varying geometry properties to a range that even allows user interactive simulations of high quality. Paralellization of both parts, the pre-computation of the model stiffness and the update phase of the simulation is simplified due to a simple and undeformed cell structure of the computation domain. A shared memory parallelized implementation of the method is presented and its performance is tested for a biomedical application of clinical relevance to demonstrate the applicability of the presented method.","PeriodicalId":287900,"journal":{"name":"2012 11th International Symposium on Parallel and Distributed Computing","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124925303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Load Balancing in Data Grids by Global Load Estimation","authors":"Lukas Rupprecht, Angelika Reiser, A. Kemper","doi":"10.1109/ISPDC.2012.40","DOIUrl":"https://doi.org/10.1109/ISPDC.2012.40","url":null,"abstract":"Peer-to-Peer (P2P) technology can be utilized to combine remote resources and build distributed, high performance database systems, called data grids, which help to handle the rapidly increasing volumes of data produced by disciplines like astrophysics, biology, or geology. One major challenge of data grids are skewed query patterns which cause load imbalances and heavily diminish performance and availability. To avoid hot spots, sophisticated load balancing techniques are required. We present a dynamic replication strategy which prevents hot spots by dynamically replicating the hot data on different locations. The main questions of such a strategy are when to copy which data to what receivers and when to delete the copies. To answer these questions we propose a low-overhead, decentralized method which is able to deliver a highly accurate estimate of the global load and the single peer loads to all clients. We use that information in an optimization problem to determine the data to be replicated and the optimal replica receivers. A simulated performance evaluation based on a real-world scenario demonstrates the effectiveness of the approach.","PeriodicalId":287900,"journal":{"name":"2012 11th International Symposium on Parallel and Distributed Computing","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114085053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Parallel and Distributed Surrogate Model Implementation for Computational Steering","authors":"D. Butnaru, G. Buse, D. Pflüger","doi":"10.1109/ISPDC.2012.35","DOIUrl":"https://doi.org/10.1109/ISPDC.2012.35","url":null,"abstract":"Understanding the influence of multiple parameters in a complex simulation setting is a difficult task. In the ideal case, the scientist can freely steer such a simulation and is immediately presented with the results for a certain configuration of the input parameters. Such an exploration process is however not possible if the simulation is computationally too expensive. For these cases we present in this paper a scalable computational steering approach utilizing a fast surrogate model as substitute for the time-consuming simulation. The surrogate model we propose is based on the sparse grid technique, and we identify the main computational tasks associated with its evaluation and its extension. We further show how distributed data management combined with the specific use of accelerators allows us to approximate and deliver simulation results to a high-resolution visualization system in real-time. This significantly enhances the steering workflow and facilitates the interactive exploration of large datasets.","PeriodicalId":287900,"journal":{"name":"2012 11th International Symposium on Parallel and Distributed Computing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121950020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Resolving Neighbourhood Relations in a Parallel Fluid Dynamic Solver","authors":"J. Frisch, R. Mundani, E. Rank","doi":"10.1109/ISPDC.2012.43","DOIUrl":"https://doi.org/10.1109/ISPDC.2012.43","url":null,"abstract":"Computational Fluid Dynamics simulations require an enormous computational effort if a physically reasonable accuracy should be reached. Therefore, a parallel implementation is inevitable. This paper describes the basics of our implemented fluid solver with a special aspect on the hierarchical data structure, unique cell and grid identification, and the neighbourhood relations in-between grids on different processes. A special server concept keeps track of every grid over all processes while minimising data transfer between the nodes.","PeriodicalId":287900,"journal":{"name":"2012 11th International Symposium on Parallel and Distributed Computing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131028573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical Hybrid Grids for Mantle Convection: A First Study","authors":"B. Gmeiner, M. Mohr, U. Rüde","doi":"10.1109/ISPDC.2012.49","DOIUrl":"https://doi.org/10.1109/ISPDC.2012.49","url":null,"abstract":"In this article we consider the application of the Hierarchical Hybrid Grid Framework (HHG) to the geodynamical problem of simulating mantle convection. We describe the generation of a refined icosahedral grid and a further subdivision of the resulting prisms into tetrahedral elements. Based on this mesh, we present performance results for HHG and compare these to the also Finite Element program TERRA, which is a well-known code for mantle convection using a matrix-free representation of the stiffness matrix. In our analysis we consider the most time consuming part of TERRA's solution algorithm and evaluate it in a strong scaling setup. Finally we present strong and weak scaling results for HHG to verify its parallel concepts, algorithms and grid flexibility on Jugene.","PeriodicalId":287900,"journal":{"name":"2012 11th International Symposium on Parallel and Distributed Computing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122228572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predictive Schemes for QoS Awareness of Publish/Subscribe Systems on MANET","authors":"Imene Lahyani, M. Gassara, M. Jmaiel, C. Chassot","doi":"10.1109/ISPDC.2012.11","DOIUrl":"https://doi.org/10.1109/ISPDC.2012.11","url":null,"abstract":"In this paper, we propose a failure prediction methodology for quality of service (QoS) degradation prediction for publish/subscribe systems on MANET. Our propose is to use the Auto Regressive Integrated Moving Average (ARIMA) method to predict failure occurrence in the system and to provide optimal QoS provision of applications. Besides, our forecasting algorithm looks for the source behind QoS degradation using the Correlation method. Simulations results are performed to prove the efficiency of the proposed approach. A comparison is done proving that our proposal outperforms the Auto Regression (AR) based prediction approach.","PeriodicalId":287900,"journal":{"name":"2012 11th International Symposium on Parallel and Distributed Computing","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124877230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Richard Membarth, Frank Hannig, J. Teich, M. Körner, Wieland Eckert
{"title":"Automatic Optimization of In-Flight Memory Transactions for GPU Accelerators Based on a Domain-Specific Language for Medical Imaging","authors":"Richard Membarth, Frank Hannig, J. Teich, M. Körner, Wieland Eckert","doi":"10.1109/ISPDC.2012.36","DOIUrl":"https://doi.org/10.1109/ISPDC.2012.36","url":null,"abstract":"An efficient memory bandwidth utilization for GPU accelerators is crucial for memory bound applications. In medical imaging, the performance of many kernels is limited by the available memory bandwidth since only a few operations are performed per pixel. For such kernels only a fraction of the compute power provided by GPU accelerators can be exploited and performance is predetermined by memory bandwidth. As a remedy, this paper investigates the optimal utilization of available memory bandwidth by means of increasing in-flight memory transactions. Instead of doing this manually for different GPU accelerators, the required CUDA and OpenCL code is automatically generated from descriptions in a Domain-Specific Language (DSL) for the considered application domain. Moreover, the DSL is extended to also support global reduction operators. We show that the generated target-specific code improves bandwidth utilization for memory-bound kernels significantly. Moreover, competitive performance compared to the GPU back end of the widely used image processing library OpenCV can be achieved.","PeriodicalId":287900,"journal":{"name":"2012 11th International Symposium on Parallel and Distributed Computing","volume":"209 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123037117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scheduling Architecturally-Supported Regions with Precedence-Based Priorities","authors":"L. Masko, M. Tudruj","doi":"10.1109/ISPDC.2012.46","DOIUrl":"https://doi.org/10.1109/ISPDC.2012.46","url":null,"abstract":"Multicore processor technology provides new possibilities in the domain of cluster-based systems. One of such viable architectural issues are globally interconnected systems of multicore executive modules. The multicore modules can be general purpose or can be architecturally supported to enhance performance for special kinds of functions such as for example included in parallel numerical libraries. The paper is concerned with scheduling methods for modular multicore systems based on global interconnections. A new scheduling algorithm is reported which enables scheduling programs represented by macro data flow graphs to a system of globally interconnected general purpose and architecturally supported modules. The algorithm applies heuristics based on weighted activation graphs of architecturally supported program regions. Experimental results show the advantages of the proposed approach comparing simpler scheduling algorithms based only on topological graph properties.","PeriodicalId":287900,"journal":{"name":"2012 11th International Symposium on Parallel and Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132060930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Teaching Parallel Programming Models on a Shallow-Water Code","authors":"Alexander Breuer, M. Bader","doi":"10.1109/ISPDC.2012.48","DOIUrl":"https://doi.org/10.1109/ISPDC.2012.48","url":null,"abstract":"We present a software package that supports teaching different parallel programming models in a computational science and engineering context. It implements a Finite Volume solver for the shallow water equations, with application to tsunami simulation in mind. The numerical model is kept simple, using patches of Cartesian grids as computational domain, which can be connected via ghost layers. The Finite Volume method is restricted to piecewise constant approximation in each grid cell, but the computation of fluxes between cells can be based on the simple Lax-Friedrichs method, as well as on versatile approximate Riemann solvers, which allows realistic simulations. We present how this code can be used to study parallelization with CUDA, MPI, OpenMP, and hybrid approaches - and is useful for both introductory lectures in parallel computing and more advanced courses.","PeriodicalId":287900,"journal":{"name":"2012 11th International Symposium on Parallel and Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122331340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving the Allocation of Communication-Intensive Applications in Clouds Using Time-Related Information","authors":"A. Stefano, G. Morana, D. Zito","doi":"10.1109/ISPDC.2012.18","DOIUrl":"https://doi.org/10.1109/ISPDC.2012.18","url":null,"abstract":"The optimal allocation of Communication-Intensive Applications is a well-know complex issue in Clouds. This kind of applications, due to the strong impact of communications on their performance, requires not only that their tasks are allocated on resources able to satisfy their computational requirements but also that the distance among these resources, in terms of communication delay or latency, is the smallest. The allocation strategies currently used, based on a static vision of resources' status, are not suitable for managing effectively the peculiarities of these applications. In this work we propose an innovative allocation strategy that, using information about the sequence of their internal interactions, improves the deployment of Communication-Intensive Applications on available resources. In particular, this strategy allows reducing the number of resources needed for executing each application and, very important, it allows reducing the influence of each application over the performance of the other ones running on the same cloud.","PeriodicalId":287900,"journal":{"name":"2012 11th International Symposium on Parallel and Distributed Computing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123650401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}