{"title":"HPC Meets Cloud: Building Efficient Clouds for HPC, Big Data, and Deep Learning Middleware and Applications","authors":"D. Panda, Xiaoyi Lu","doi":"10.1145/3147213.3149455","DOIUrl":null,"url":null,"abstract":"Significant growth has been witnessed during the last few years in HPC clusters with multi-/many-core processors, accelerators, and high-performance interconnects (such as InfiniBand, Omni-Path, iWARP, and RoCE). To alleviate the cost burden, sharing HPC cluster resources to end users through virtualization for both scientific computing and Big Data processing is becoming more and more attractive. In this tutorial, we first provide an overview of popular virtualization system software on HPC cloud environments, such as hypervisors (e.g., KVM), containers (e.g., Docker, Singularity), OpenStack, Slurm, etc. Then we provide an overview of high-performance interconnects and communication mechanisms on HPC clouds, such as InfiniBand, RDMA, SR-IOV, IVShmem, etc. We further discuss the opportunities and technical challenges of designing high-performance MPI runtime over these environments. Next, we introduce our proposed novel approaches to enhance MPI library design over SR-IOV enabled InfiniBand clusters with both virtual machines and containers. We also discuss how to integrate these designs into popular cloud management systems like OpenStack and HPC cluster resource managers like Slurm. Not only for HPC middleware and applications, we will demonstrate how high- performance solutions can be designed to run Big Data and Deep Learning workloads (like Hadoop, Spark, TensorFlow, CNTK, Caffe) in HPC cloud environments.","PeriodicalId":341011,"journal":{"name":"Proceedings of the10th International Conference on Utility and Cloud Computing","volume":"116 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the10th International Conference on Utility and Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3147213.3149455","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Significant growth has been witnessed during the last few years in HPC clusters with multi-/many-core processors, accelerators, and high-performance interconnects (such as InfiniBand, Omni-Path, iWARP, and RoCE). To alleviate the cost burden, sharing HPC cluster resources to end users through virtualization for both scientific computing and Big Data processing is becoming more and more attractive. In this tutorial, we first provide an overview of popular virtualization system software on HPC cloud environments, such as hypervisors (e.g., KVM), containers (e.g., Docker, Singularity), OpenStack, Slurm, etc. Then we provide an overview of high-performance interconnects and communication mechanisms on HPC clouds, such as InfiniBand, RDMA, SR-IOV, IVShmem, etc. We further discuss the opportunities and technical challenges of designing high-performance MPI runtime over these environments. Next, we introduce our proposed novel approaches to enhance MPI library design over SR-IOV enabled InfiniBand clusters with both virtual machines and containers. We also discuss how to integrate these designs into popular cloud management systems like OpenStack and HPC cluster resource managers like Slurm. Not only for HPC middleware and applications, we will demonstrate how high- performance solutions can be designed to run Big Data and Deep Learning workloads (like Hadoop, Spark, TensorFlow, CNTK, Caffe) in HPC cloud environments.