Yucheng Zhang, Lev Gorenstein, Payas Bhutra, Ryan T. DeRue
{"title":"Containerized Bioinformatics Ecosystem for HPC","authors":"Yucheng Zhang, Lev Gorenstein, Payas Bhutra, Ryan T. DeRue","doi":"10.1109/HUST56722.2022.00006","DOIUrl":null,"url":null,"abstract":"Container technologies such as Docker and SingularityCE wrap the application together with everything it needs to run into an isolated environment. This enables containerized applications to always run the same regardless of the environment in which they are running, which positions container technology as a critical tool for data reproducibility in science. In high-performance computing (HPC) environments, SingularityCE has been widely used, and the primary reason for its popularity is that it can significantly reduce system administrators' work of deploying applications. One such domain where we see potential for this technology is in the deployment of bioinformatics applications. Bioinformatics is an interdisciplinary scientific field combining biology, chemistry, computer science, mathematics, statistics, and other areas of science. Traditionally, HPC system administrators may need thousands of hours to compile, install, and deploy a broad stack of bioinformatics applications for users. HPC-friendly container technologies have the potential to transform traditional methods of installing and managing applications. This paper introduces how our HPC center used SingularityCE to provide over 600 containerized bioinformatics applications that were tested by staff with expertise in bioinformatics, on 6 campus production systems as well as ACCESS Anvil. This paper will also explore how, leveraging Lmod, containerization was made transparent to users through environment modules for these container images. Finally, it will discuss how we deployed applications with a graphical user interface (GUI) to Open OnDemand as interactive applications, how we modified Python-based container images to support Jupyter notebooks, and how we generated detailed usage documentation for each application on the ReadTheDocs platform. The sum of these contributions provides a robust and reproducible computing ecosystem for life science researchers. The general approach outlined in this paper is easily adaptable to utilize any underlying container technology for any collection of applications.","PeriodicalId":308756,"journal":{"name":"2022 IEEE/ACM International Workshop on HPC User Support Tools (HUST)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Workshop on HPC User Support Tools (HUST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HUST56722.2022.00006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Container technologies such as Docker and SingularityCE wrap the application together with everything it needs to run into an isolated environment. This enables containerized applications to always run the same regardless of the environment in which they are running, which positions container technology as a critical tool for data reproducibility in science. In high-performance computing (HPC) environments, SingularityCE has been widely used, and the primary reason for its popularity is that it can significantly reduce system administrators' work of deploying applications. One such domain where we see potential for this technology is in the deployment of bioinformatics applications. Bioinformatics is an interdisciplinary scientific field combining biology, chemistry, computer science, mathematics, statistics, and other areas of science. Traditionally, HPC system administrators may need thousands of hours to compile, install, and deploy a broad stack of bioinformatics applications for users. HPC-friendly container technologies have the potential to transform traditional methods of installing and managing applications. This paper introduces how our HPC center used SingularityCE to provide over 600 containerized bioinformatics applications that were tested by staff with expertise in bioinformatics, on 6 campus production systems as well as ACCESS Anvil. This paper will also explore how, leveraging Lmod, containerization was made transparent to users through environment modules for these container images. Finally, it will discuss how we deployed applications with a graphical user interface (GUI) to Open OnDemand as interactive applications, how we modified Python-based container images to support Jupyter notebooks, and how we generated detailed usage documentation for each application on the ReadTheDocs platform. The sum of these contributions provides a robust and reproducible computing ecosystem for life science researchers. The general approach outlined in this paper is easily adaptable to utilize any underlying container technology for any collection of applications.