Containerized Bioinformatics Ecosystem for HPC

Yucheng Zhang, Lev Gorenstein, Payas Bhutra, Ryan T. DeRue
{"title":"Containerized Bioinformatics Ecosystem for HPC","authors":"Yucheng Zhang, Lev Gorenstein, Payas Bhutra, Ryan T. DeRue","doi":"10.1109/HUST56722.2022.00006","DOIUrl":null,"url":null,"abstract":"Container technologies such as Docker and SingularityCE wrap the application together with everything it needs to run into an isolated environment. This enables containerized applications to always run the same regardless of the environment in which they are running, which positions container technology as a critical tool for data reproducibility in science. In high-performance computing (HPC) environments, SingularityCE has been widely used, and the primary reason for its popularity is that it can significantly reduce system administrators' work of deploying applications. One such domain where we see potential for this technology is in the deployment of bioinformatics applications. Bioinformatics is an interdisciplinary scientific field combining biology, chemistry, computer science, mathematics, statistics, and other areas of science. Traditionally, HPC system administrators may need thousands of hours to compile, install, and deploy a broad stack of bioinformatics applications for users. HPC-friendly container technologies have the potential to transform traditional methods of installing and managing applications. This paper introduces how our HPC center used SingularityCE to provide over 600 containerized bioinformatics applications that were tested by staff with expertise in bioinformatics, on 6 campus production systems as well as ACCESS Anvil. This paper will also explore how, leveraging Lmod, containerization was made transparent to users through environment modules for these container images. Finally, it will discuss how we deployed applications with a graphical user interface (GUI) to Open OnDemand as interactive applications, how we modified Python-based container images to support Jupyter notebooks, and how we generated detailed usage documentation for each application on the ReadTheDocs platform. The sum of these contributions provides a robust and reproducible computing ecosystem for life science researchers. The general approach outlined in this paper is easily adaptable to utilize any underlying container technology for any collection of applications.","PeriodicalId":308756,"journal":{"name":"2022 IEEE/ACM International Workshop on HPC User Support Tools (HUST)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Workshop on HPC User Support Tools (HUST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HUST56722.2022.00006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Container technologies such as Docker and SingularityCE wrap the application together with everything it needs to run into an isolated environment. This enables containerized applications to always run the same regardless of the environment in which they are running, which positions container technology as a critical tool for data reproducibility in science. In high-performance computing (HPC) environments, SingularityCE has been widely used, and the primary reason for its popularity is that it can significantly reduce system administrators' work of deploying applications. One such domain where we see potential for this technology is in the deployment of bioinformatics applications. Bioinformatics is an interdisciplinary scientific field combining biology, chemistry, computer science, mathematics, statistics, and other areas of science. Traditionally, HPC system administrators may need thousands of hours to compile, install, and deploy a broad stack of bioinformatics applications for users. HPC-friendly container technologies have the potential to transform traditional methods of installing and managing applications. This paper introduces how our HPC center used SingularityCE to provide over 600 containerized bioinformatics applications that were tested by staff with expertise in bioinformatics, on 6 campus production systems as well as ACCESS Anvil. This paper will also explore how, leveraging Lmod, containerization was made transparent to users through environment modules for these container images. Finally, it will discuss how we deployed applications with a graphical user interface (GUI) to Open OnDemand as interactive applications, how we modified Python-based container images to support Jupyter notebooks, and how we generated detailed usage documentation for each application on the ReadTheDocs platform. The sum of these contributions provides a robust and reproducible computing ecosystem for life science researchers. The general approach outlined in this paper is easily adaptable to utilize any underlying container technology for any collection of applications.
高性能计算的容器化生物信息学生态系统
容器技术(如Docker和SingularityCE)将应用程序与运行在隔离环境中所需的所有东西打包在一起。这使得容器化的应用程序无论在什么环境中运行,都能始终以相同的方式运行,这使得容器技术成为科学中数据可再现性的关键工具。在高性能计算(HPC)环境中,singarityce得到了广泛的应用,其受欢迎的主要原因是它可以显著减少系统管理员部署应用程序的工作。我们看到这种技术潜力的一个领域是生物信息学应用的部署。生物信息学是结合生物学、化学、计算机科学、数学、统计学和其他科学领域的跨学科科学领域。传统上,HPC系统管理员可能需要数千小时来为用户编译、安装和部署大量的生物信息学应用程序。hpc友好型容器技术有可能改变传统的安装和管理应用程序的方法。本文介绍了我们的高性能计算中心如何使用SingularityCE提供600多个容器化生物信息学应用程序,这些应用程序由具有生物信息学专业知识的工作人员在6个校园生产系统和ACCESS Anvil上进行测试。本文还将探讨如何利用Lmod,通过这些容器映像的环境模块使容器化对用户透明。最后,我们将讨论如何将带有图形用户界面(GUI)的应用程序作为交互式应用程序部署到Open OnDemand,如何修改基于python的容器映像以支持Jupyter笔记本,以及如何为ReadTheDocs平台上的每个应用程序生成详细的使用文档。这些贡献的总和为生命科学研究人员提供了一个健壮的、可重复的计算生态系统。本文概述的一般方法很容易适应于为任何应用程序集合使用任何底层容器技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信