{"title":"Efficient Runtime Environment for Coupled Multi-physics Simulations: Dynamic Resource Allocation and Load-Balancing","authors":"S. Ko, Nayong Kim, Joohyun Kim, A. Thota, S. Jha","doi":"10.1109/CCGRID.2010.107","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.107","url":null,"abstract":"Coupled Multi-Physics simulations, such as hybrid CFD-MD simulations, represent an increasingly important class of scientific applications. Often the physical problems of interest demand the use of high-end computers, such as TeraGrid resources, which are often accessible only via batch-queues. Batch-queue systems are not developed to natively support the coordinated scheduling of jobs – which in turn is required to support the concurrent execution required by coupled multi-physics simulations. In this paper we develop and demonstrate a novel approach to overcome the lack of native support for coordinated job submission requirement associated with coupled runs. We establish the performance advantages arising from our solution, which is a generalization of the Pilot-Job concept – which in of itself is not new, but is being applied to coupled simulations for the first time. Our solution not only overcomes the initial co-scheduling problem, but also provides a dynamic resource allocation mechanism. Support for such dynamic resources is critical for a load balancing mechanism, which we develop and demonstrate to be effective at reducing the total time-to-solution of the problem. We establish that the performance advantage of using Big Jobs is invariant with the size of the machine as well as the size of the physical model under investigation. The Pilot-Job abstraction is developed using SAGA, which provides an infrastructure agnostic implementation, and which can seamlessly execute and utilize distributed resources.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124977535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Lightweight Approach to Use Grid Services with Grid Widgets on Grid WebOS","authors":"Yi-Lun Pan, Chang-Hsing Wu, Chia-Yen Liu, Hsi-En Yu, Weicheng Huang","doi":"10.1109/CCGRID.2010.25","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.25","url":null,"abstract":"To bridge the gap between computing grid environment and users, various Grid Widgets are developed by the Grid development team in the National Center for High-performance Computing (NCHC). These widgets are implemented to provide users with seamless and scalable access to Grid resources. Currently, this effort integrates the de facto Grid middleware, Web-based Operating System (WebOS), and automatic resource allocation mechanism to form a virtual computer in distributed computing environment. With the capability of automatic resource allocation and the feature of dynamic load prediction, the Resource Broker (RB) improves the performance of the dynamic scheduling over conventional scheduling policies. With this extremely lightweight and flexible approach to acquire Grid services, the barrier for users to access geographically distributed heterogeneous Grid resources is largely reduced. The Grid Widgets can also be customized and configured to meet the demands of the users.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121904228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Docan, Fan Zhang, M. Parashar, J. Cummings, N. Podhorszki, S. Klasky
{"title":"Experiments with Memory-to-Memory Coupling for End-to-End Fusion Simulation Workflows","authors":"C. Docan, Fan Zhang, M. Parashar, J. Cummings, N. Podhorszki, S. Klasky","doi":"10.1109/CCGRID.2010.101","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.101","url":null,"abstract":"Scientific applications are striving to accurately simulate multiple interacting physical processes that comprise complex phenomena being modeled. Efficient and scalable parallel implementations of these coupled simulations present challenging interaction and coordination requirements, especially when the coupled physical processes are computationally heterogeneous and progress at different speeds. In this paper, we present the design, implementation and evaluation of a memory-to-memory coupling framework for coupled scientific simulations on high-performance parallel computing platforms. The framework is driven by the coupling requirements of the Center for Plasma Edge Simulation, and it provides simple coupling abstractions as well as efficient asynchronous (RDMA-based) memory-to-memory data transport mechanisms that complement existing parallel programming systems and data sharing frameworks. The framework enables flexible coupling behaviors that are asynchronous in time and space, and it supports dynamic coupling between heterogeneous simulation processes without enforcing any synchronization constraints. We evaluate the performance and scalability of the coupling framework using a specific coupling scenario, on the Jaguar Cray XT5 system at Oak Ridge National Laboratory.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123916973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Elastic Site: Using Clouds to Elastically Extend Site Resources","authors":"Paul Marshall, K. Keahey, Timothy Freeman","doi":"10.1109/CCGRID.2010.80","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.80","url":null,"abstract":"Infrastructure-as-a-Service (IaaS) cloud computing offers new possibilities to scientific communities. One of the most significant is the ability to elastically provision and relinquish new resources in response to changes in demand. In our work, we develop a model of an “elastic site” that efficiently adapts services provided within a site, such as batch schedulers, storage archives, or Web services to take advantage of elastically provisioned resources. We describe the system architecture along with the issues involved with elastic provisioning, such as security, privacy, and various logistical considerations. To avoid over- or under-provisioning the resources we propose three different policies to efficiently schedule resource deployment based on demand. We have implemented a resource manager, built on the Nimbus toolkit to dynamically and securely extend existing physical clusters into the cloud. Our elastic site manager interfaces directly with local resource managers, such as Torque. We have developed and evaluated policies for resource provisioning on a Nimbus-based cloud at the University of Chicago, another at Indiana University, and Amazon EC2. We demonstrate a dynamic and responsive elastic cluster, capable of responding effectively to a variety of job submission patterns. We also demonstrate that we can process 10 times faster by expanding our cluster up to 150 EC2 nodes.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"343 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124313818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Krefting, R. Lützkendorf, Kathrin Peter, J. Bernarding
{"title":"Performance Analysis of Diffusion Tensor Imaging in an Academic Production Grid","authors":"D. Krefting, R. Lützkendorf, Kathrin Peter, J. Bernarding","doi":"10.1109/CCGRID.2010.21","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.21","url":null,"abstract":"Analysis of diffusion weighted magnetic resonance images serves increasingly for non-invasive tracking of nerve fibers in the human brain, both in clinical diagnosis and basic research. Diffusion-tensor imaging (DTI) enables in-vivo research on the internal structure of the central nervous system, an estimation of the interconnection of functional areas and diagnosis of brain tumors and de-myelinating diseases. But modeling the local diffusion parameters is computationally expensive and on standard desktop computers runtimes of up to days are common. A workflow based grid implementation of the algorithm with slice-based parallelization has shown significant speedup. However, in production use, the implementation frequently delayed and even failed, discouraging the medical collaborators to take up the management of the data processing themselves. Therefore a comprehensive analysis of possible sources for errors and delays as well as their real impact in the respective infrastructure is vital to enable clinical researchers to fully exploit the benefits of the Healthgrid application. In this manuscript, we tested different implementations of the DTI analysis with respect to robustness and runtime. Based on the results, concrete application improvements as well as general suggestions for the layout and maintenance of Healthgrids are concluded.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122733833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. V. Dastjerdi, Sayed Gholam Hassan Tabatabaei, R. Buyya
{"title":"An Effective Architecture for Automated Appliance Management System Applying Ontology-Based Cloud Discovery","authors":"A. V. Dastjerdi, Sayed Gholam Hassan Tabatabaei, R. Buyya","doi":"10.1109/CCGRID.2010.87","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.87","url":null,"abstract":"Cloud computing is a computing paradigm which allows access of computing elements and storages on-demand over the Internet. Virtual Appliances, pre-configured, ready-to-run applications are emerging as a breakthrough technology to solve the complexities of service deployment on Cloud infrastructure. However, an automated approach to deploy required appliances on the most suitable Cloud infrastructure is neglected by previous works which is the focus of this work. In this paper, we propose an effective architecture using ontology-based discovery to provide QoS aware deployment of appliances on Cloud service providers. In addition, we test our approach on a case study and the result shows the efficiency and effectiveness of the proposed work.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116813584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
James Dinan, Arjun Singri, P. Sadayappan, S. Krishnamoorthy
{"title":"Selective Recovery from Failures in a Task Parallel Programming Model","authors":"James Dinan, Arjun Singri, P. Sadayappan, S. Krishnamoorthy","doi":"10.1109/CCGRID.2010.34","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.34","url":null,"abstract":"We present a fault tolerant task pool execution environment that is capable of performing fine-grain selective restart using a lightweight, distributed task completion tracking mechanism. Compared with conventional checkpoint/restart techniques, this system offers a recovery penalty that is proportional to the degree of failure rather than the system size. We evaluate this system using the Self Consistent Field (SCF) kernel which forms an important component in ab initio methods for computational chemistry. Experimental results indicate that fault tolerant task pools are robust in the presence of an arbitrary number of failures and that they offer low overhead in the absence of faults.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129029704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tatsuhiro Chiba, M. Burger, T. Kielmann, S. Matsuoka
{"title":"Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds","authors":"Tatsuhiro Chiba, M. Burger, T. Kielmann, S. Matsuoka","doi":"10.1109/CCGRID.2010.63","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.63","url":null,"abstract":"Data-intensive parallel applications on clouds need to deploy large data sets from the cloud's storage facility to all compute nodes as fast as possible. Many multicast algorithms have been proposed for clusters and grid environments. The most common approach is to construct one or more spanning trees based on the network topology and network monitoring data in order to maximize available bandwidth and avoid bottleneck links. However, delivering optimal performance becomes difficult once the available bandwidth changes dynamically. In this paper, we focus on Amazon EC2/S3 (the most commonly used cloud platform today) and propose two high performance multicast algorithms. These algorithms make it possible to efficiently transfer large amounts of data stored in Amazon S3 to multiple Amazon EC2 nodes. The three salient features of our algorithms are (1) to construct an overlay network on clouds without network topology information, (2) to optimize the total throughput dynamically, and (3) to increase the download throughput by letting nodes cooperate with each other. The two algorithms differ in the way nodes cooperate: the first `non-steal' algorithm lets each node download an equal share of all data, while the second `steal' algorithm uses work stealing to counter the effect of heterogeneous download bandwidth. As a result, all nodes can download files from S3 quickly, even when the network performance changes while the algorithm is running. We evaluate our algorithms on EC2/S3, and show that they are scalable and consistently achieve high throughput. Both algorithms perform much better than having each node downloading all data directly from S3.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129072012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cluster Computing as an Assembly Process: Coordination with S-Net","authors":"C. Grelck, Jukka Julku, F. Penczek, A. Shafarenko","doi":"10.1109/CCGRID.2010.103","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.103","url":null,"abstract":"This poster will present a coordination language for distributed computing and will discuss its application to cluster computing. It will introduce a programming technique of cluster computing whereby application components are completely dissociated from the communication/coordination infrastructure (unlike MPI-style message passing), and there is no shared memory either, whether virtual or physical (unlike Open-MP). Cluster computing is thus presented as something that happens as late as the assembly stage: components are integrated into an application using a new form of network glue: Single-Input, Single-Output (SISO) asynchronous, no deterministic coordination.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121800839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-FFT Vectorization for the Cell Multicore Processor","authors":"J. Barhen, T. Humble, P. Mitra, M. Traweek","doi":"10.1109/CCGRID.2010.78","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.78","url":null,"abstract":"The emergence of streaming multicore processors with multi-SIMD architectures and ultra-low power operation combined with real-time compute and I/O reconfigurability opens unprecedented opportunities for executing sophisticated signal processing algorithms faster and within a much lower energy budget. Here, we present an unconventional FFT implementation scheme for the IBM Cell, named transverse vectorization. It is shown to outperform (both in terms of timing or GFLOP throughput) the fastest FFT results reported to date in the open literature.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122303248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}