2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing最新文献_第10页

Efficient Runtime Environment for Coupled Multi-physics Simulations: Dynamic Resource Allocation and Load-Balancing 耦合多物理场仿真的高效运行环境:动态资源分配和负载平衡

2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing Pub Date : 2010-05-17 DOI: 10.1109/CCGRID.2010.107

S. Ko, Nayong Kim, Joohyun Kim, A. Thota, S. Jha

{"title":"Efficient Runtime Environment for Coupled Multi-physics Simulations: Dynamic Resource Allocation and Load-Balancing","authors":"S. Ko, Nayong Kim, Joohyun Kim, A. Thota, S. Jha","doi":"10.1109/CCGRID.2010.107","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.107","url":null,"abstract":"Coupled Multi-Physics simulations, such as hybrid CFD-MD simulations, represent an increasingly important class of scientific applications. Often the physical problems of interest demand the use of high-end computers, such as TeraGrid resources, which are often accessible only via batch-queues. Batch-queue systems are not developed to natively support the coordinated scheduling of jobs – which in turn is required to support the concurrent execution required by coupled multi-physics simulations. In this paper we develop and demonstrate a novel approach to overcome the lack of native support for coordinated job submission requirement associated with coupled runs. We establish the performance advantages arising from our solution, which is a generalization of the Pilot-Job concept – which in of itself is not new, but is being applied to coupled simulations for the first time. Our solution not only overcomes the initial co-scheduling problem, but also provides a dynamic resource allocation mechanism. Support for such dynamic resources is critical for a load balancing mechanism, which we develop and demonstrate to be effective at reducing the total time-to-solution of the problem. We establish that the performance advantage of using Big Jobs is invariant with the size of the machine as well as the size of the physical model under investigation. The Pilot-Job abstraction is developed using SAGA, which provides an infrastructure agnostic implementation, and which can seamlessly execute and utilize distributed resources.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124977535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

The Lightweight Approach to Use Grid Services with Grid Widgets on Grid WebOS 在网格WebOS上使用网格服务和网格小部件的轻量级方法

2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing Pub Date : 2010-05-17 DOI: 10.1109/CCGRID.2010.25

Yi-Lun Pan, Chang-Hsing Wu, Chia-Yen Liu, Hsi-En Yu, Weicheng Huang

引用次数: 0

Experiments with Memory-to-Memory Coupling for End-to-End Fusion Simulation Workflows 面向端到端融合仿真工作流的内存-内存耦合实验

2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing Pub Date : 2010-05-17 DOI: 10.1109/CCGRID.2010.101

C. Docan, Fan Zhang, M. Parashar, J. Cummings, N. Podhorszki, S. Klasky

{"title":"Experiments with Memory-to-Memory Coupling for End-to-End Fusion Simulation Workflows","authors":"C. Docan, Fan Zhang, M. Parashar, J. Cummings, N. Podhorszki, S. Klasky","doi":"10.1109/CCGRID.2010.101","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.101","url":null,"abstract":"Scientific applications are striving to accurately simulate multiple interacting physical processes that comprise complex phenomena being modeled. Efficient and scalable parallel implementations of these coupled simulations present challenging interaction and coordination requirements, especially when the coupled physical processes are computationally heterogeneous and progress at different speeds. In this paper, we present the design, implementation and evaluation of a memory-to-memory coupling framework for coupled scientific simulations on high-performance parallel computing platforms. The framework is driven by the coupling requirements of the Center for Plasma Edge Simulation, and it provides simple coupling abstractions as well as efficient asynchronous (RDMA-based) memory-to-memory data transport mechanisms that complement existing parallel programming systems and data sharing frameworks. The framework enables flexible coupling behaviors that are asynchronous in time and space, and it supports dynamic coupling between heterogeneous simulation processes without enforcing any synchronization constraints. We evaluate the performance and scalability of the coupling framework using a specific coupling scenario, on the Jaguar Cray XT5 system at Oak Ridge National Laboratory.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123916973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Elastic Site: Using Clouds to Elastically Extend Site Resources 弹性站点:使用云来弹性地扩展站点资源

2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing Pub Date : 2010-05-17 DOI: 10.1109/CCGRID.2010.80

Paul Marshall, K. Keahey, Timothy Freeman

{"title":"Elastic Site: Using Clouds to Elastically Extend Site Resources","authors":"Paul Marshall, K. Keahey, Timothy Freeman","doi":"10.1109/CCGRID.2010.80","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.80","url":null,"abstract":"Infrastructure-as-a-Service (IaaS) cloud computing offers new possibilities to scientific communities. One of the most significant is the ability to elastically provision and relinquish new resources in response to changes in demand. In our work, we develop a model of an “elastic site” that efficiently adapts services provided within a site, such as batch schedulers, storage archives, or Web services to take advantage of elastically provisioned resources. We describe the system architecture along with the issues involved with elastic provisioning, such as security, privacy, and various logistical considerations. To avoid over- or under-provisioning the resources we propose three different policies to efficiently schedule resource deployment based on demand. We have implemented a resource manager, built on the Nimbus toolkit to dynamically and securely extend existing physical clusters into the cloud. Our elastic site manager interfaces directly with local resource managers, such as Torque. We have developed and evaluated policies for resource provisioning on a Nimbus-based cloud at the University of Chicago, another at Indiana University, and Amazon EC2. We demonstrate a dynamic and responsive elastic cluster, capable of responding effectively to a variety of job submission patterns. We also demonstrate that we can process 10 times faster by expanding our cluster up to 150 EC2 nodes.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"343 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124313818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 283

Performance Analysis of Diffusion Tensor Imaging in an Academic Production Grid 学术生产网格中扩散张量成像性能分析

2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing Pub Date : 2010-05-17 DOI: 10.1109/CCGRID.2010.21

D. Krefting, R. Lützkendorf, Kathrin Peter, J. Bernarding

{"title":"Performance Analysis of Diffusion Tensor Imaging in an Academic Production Grid","authors":"D. Krefting, R. Lützkendorf, Kathrin Peter, J. Bernarding","doi":"10.1109/CCGRID.2010.21","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.21","url":null,"abstract":"Analysis of diffusion weighted magnetic resonance images serves increasingly for non-invasive tracking of nerve fibers in the human brain, both in clinical diagnosis and basic research. Diffusion-tensor imaging (DTI) enables in-vivo research on the internal structure of the central nervous system, an estimation of the interconnection of functional areas and diagnosis of brain tumors and de-myelinating diseases. But modeling the local diffusion parameters is computationally expensive and on standard desktop computers runtimes of up to days are common. A workflow based grid implementation of the algorithm with slice-based parallelization has shown significant speedup. However, in production use, the implementation frequently delayed and even failed, discouraging the medical collaborators to take up the management of the data processing themselves. Therefore a comprehensive analysis of possible sources for errors and delays as well as their real impact in the respective infrastructure is vital to enable clinical researchers to fully exploit the benefits of the Healthgrid application. In this manuscript, we tested different implementations of the DTI analysis with respect to robustness and runtime. Based on the results, concrete application improvements as well as general suggestions for the layout and maintenance of Healthgrids are concluded.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122733833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

An Effective Architecture for Automated Appliance Management System Applying Ontology-Based Cloud Discovery 应用基于本体的云发现的自动化设备管理系统的有效架构

2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing Pub Date : 2010-05-17 DOI: 10.1109/CCGRID.2010.87

A. V. Dastjerdi, Sayed Gholam Hassan Tabatabaei, R. Buyya

引用次数: 112

Selective Recovery from Failures in a Task Parallel Programming Model 任务并行编程模型中的故障选择性恢复

2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing Pub Date : 2010-05-17 DOI: 10.1109/CCGRID.2010.34

James Dinan, Arjun Singri, P. Sadayappan, S. Krishnamoorthy

引用次数: 10

Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds 云上数据密集型应用的动态负载均衡组播

2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing Pub Date : 2010-05-17 DOI: 10.1109/CCGRID.2010.63

Tatsuhiro Chiba, M. Burger, T. Kielmann, S. Matsuoka

{"title":"Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds","authors":"Tatsuhiro Chiba, M. Burger, T. Kielmann, S. Matsuoka","doi":"10.1109/CCGRID.2010.63","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.63","url":null,"abstract":"Data-intensive parallel applications on clouds need to deploy large data sets from the cloud's storage facility to all compute nodes as fast as possible. Many multicast algorithms have been proposed for clusters and grid environments. The most common approach is to construct one or more spanning trees based on the network topology and network monitoring data in order to maximize available bandwidth and avoid bottleneck links. However, delivering optimal performance becomes difficult once the available bandwidth changes dynamically. In this paper, we focus on Amazon EC2/S3 (the most commonly used cloud platform today) and propose two high performance multicast algorithms. These algorithms make it possible to efficiently transfer large amounts of data stored in Amazon S3 to multiple Amazon EC2 nodes. The three salient features of our algorithms are (1) to construct an overlay network on clouds without network topology information, (2) to optimize the total throughput dynamically, and (3) to increase the download throughput by letting nodes cooperate with each other. The two algorithms differ in the way nodes cooperate: the first `non-steal' algorithm lets each node download an equal share of all data, while the second `steal' algorithm uses work stealing to counter the effect of heterogeneous download bandwidth. As a result, all nodes can download files from S3 quickly, even when the network performance changes while the algorithm is running. We evaluate our algorithms on EC2/S3, and show that they are scalable and consistently achieve high throughput. Both algorithms perform much better than having each node downloading all data directly from S3.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129072012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Cluster Computing as an Assembly Process: Coordination with S-Net 作为装配过程的集群计算:与S-Net的协调

2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing Pub Date : 2010-05-17 DOI: 10.1109/CCGRID.2010.103

C. Grelck, Jukka Julku, F. Penczek, A. Shafarenko

引用次数: 0

Multi-FFT Vectorization for the Cell Multicore Processor Cell多核处理器的多fft矢量化

2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing Pub Date : 2010-05-17 DOI: 10.1109/CCGRID.2010.78

J. Barhen, T. Humble, P. Mitra, M. Traweek

引用次数: 3