{"title":"Feasibility of Running Singularity Containers with Hybrid MPI on NASA High-End Computing Resources","authors":"Y. Chang, S. Heistand, R. Hood, Henry Jin","doi":"10.1109/CANOPIEHPC54579.2021.00007","DOIUrl":"https://doi.org/10.1109/CANOPIEHPC54579.2021.00007","url":null,"abstract":"This work investigates the feasibility of a Singularity container-based solution to support a customizable computing environment for running users' MPI applications in “hybrid” MPI mode-where the MPI on the host machine works in tandem with MPI inside the container-on NASA's High-End Computing Capability (HECC) resources. Two types of real-world applications were tested: traditional High-Performance Computing (HPC) and Artificial Intelligence/Machine Learning (AI/ML). On the traditional HPC side, two JEDI containers built with Intel MPI for Earth science modeling were tested on both HECC in-house and HECC AWS Cloud CPU resources. On the AI/ML side, a NVIDIA TensorFlow container built with OpenMPI was tested with a Neural Collaborative Filtering recommender system and the ResNet-50 computer image system on the HECC in-house V100 GPUs. For each of these applications and resource environments, multiple hurdles were overcome after lengthy debugging efforts. Among them, the most significant ones were due to the conflicts between a host MPI and a container MPI and the complexity of the communication layers underneath. Although porting containers to run with a single node using just the container MPI is quite straightforward, our exercises demonstrate that running across multiple nodes in hybrid MPI mode requires knowledge of Singularity, MPI libraries, the operating system image, and the communication infrastructure such as the transport and network layers, which are traditionally handled by support staff of HPC centers and hardware or software vendors. In conclusion, porting and running Singularity containers on HECC resources or other data centers with similar environments is feasible but most users would need help to run them in hybrid MPI mode.","PeriodicalId":237957,"journal":{"name":"2021 3rd International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC (CANOPIE-HPC)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130870887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"[Copyright notice]","authors":"","doi":"10.1109/canopiehpc54579.2021.00002","DOIUrl":"https://doi.org/10.1109/canopiehpc54579.2021.00002","url":null,"abstract":"","PeriodicalId":237957,"journal":{"name":"2021 3rd International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC (CANOPIE-HPC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122834594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Claudia Misale, M. Drocco, Daniel J. Milroy, Carlos Eduardo Arango Gutierrez, Stephen Herbein, D. Ahn, Yoonho Park
{"title":"It's a Scheduling Affair: GROMACS in the Cloud with the KubeFlux Scheduler","authors":"Claudia Misale, M. Drocco, Daniel J. Milroy, Carlos Eduardo Arango Gutierrez, Stephen Herbein, D. Ahn, Yoonho Park","doi":"10.1109/CANOPIEHPC54579.2021.00006","DOIUrl":"https://doi.org/10.1109/CANOPIEHPC54579.2021.00006","url":null,"abstract":"In this work, we address the problem of running HPC workloads efficiently on Kubernetes clusters. To do so, we compare the Kubernetes' default scheduler with KubeFlux, a Kubernetes plug-in scheduler built on the Flux graph-based scheduler, on a 34- node Red Hat OpenShift cluster on IBM Cloud. We detail how scheduling can affect the performance of GROMACS, a well-known HPC application, and we show that KubeFlux can improve its performance through better pod scheduling. In our tests, KubeFlux demonstrates the tendency to limit the number of subnets spanned by a job and the maximum number of pods per node, translating to a > 2x speedup over the Kubernetes default scheduler in several cases.","PeriodicalId":237957,"journal":{"name":"2021 3rd International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC (CANOPIE-HPC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127322451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gregory J. Zynda, S. Gopaulakrishnan, John M. Fonner
{"title":"RollingGantryCrane: Automation for unpacking containers into HPC environments","authors":"Gregory J. Zynda, S. Gopaulakrishnan, John M. Fonner","doi":"10.1109/CANOPIEHPC54579.2021.00008","DOIUrl":"https://doi.org/10.1109/CANOPIEHPC54579.2021.00008","url":null,"abstract":"Software containers are an important common currency for portability and reproducibility in the modern world of computing. While they are easy to share through public registries, usage documentation is often lacking, effectively leaving users with black boxes. RollingGantryCrane (RGC) is an open-source tool that takes generic software containers and automatically exposes the internal software through LMOD environment modules. Users provide the container URLs they wish to use, and RGC pulls the containers, collects descriptive metadata from public repositories, scans for non-standard executables on each container's search path, and generates LMOD modulefiles with help text and shell functions that transparently expose applications directly to the command line interface. RGC has been used in production since early 2019 on five production systems at The Texas Advanced Computing Center (TACC), allowing users to create bespoke modules and serving over 3000 unique tools from the BioContainers project.","PeriodicalId":237957,"journal":{"name":"2021 3rd International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC (CANOPIE-HPC)","volume":"435 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122485421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of CANOPIE-HPC 2021: 3rd International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC","authors":"","doi":"10.1109/canopiehpc54579.2021.00001","DOIUrl":"https://doi.org/10.1109/canopiehpc54579.2021.00001","url":null,"abstract":"","PeriodicalId":237957,"journal":{"name":"2021 3rd International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC (CANOPIE-HPC)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122262790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Message from the Workshop Chairs","authors":"Whpcsc","doi":"10.1109/canopiehpc54579.2021.00004","DOIUrl":"https://doi.org/10.1109/canopiehpc54579.2021.00004","url":null,"abstract":"Organizers of the Workshop on High Performance Computing for Smart Cities (WHPCSC 2019) are proud to present the selected papers in the current proceedings. This edition of the event is part of Creating City initiative, a group formed in 2016 by worldwide researchers focused on cutting-edge solutions for modern cities. The terms Smart and Digital Cities have been used to describe recent advances towards more accessible, self-adaptive and information-based cities. The building blocks of a Smart City are technological devices capable of communicating, processing information and becoming part of the decision-making, but this requires high-performance computing together with computational intelligence capabilities. With these ideas in mind, previous events were organized by the group focused on Computational Intelligence field, such as the 1st and 2nd Workshops on Computational Intelligence and Smart Cities, while this workshop finally touched the fundamental aspects of High Performance Computing (HPC) together with Smart Cities.","PeriodicalId":237957,"journal":{"name":"2021 3rd International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC (CANOPIE-HPC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127774019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}