使用试点作业和CernVM文件系统简化容器和软件分发的使用

N. Urs, M. Mambelli, D. Dykstra
{"title":"使用试点作业和CernVM文件系统简化容器和软件分发的使用","authors":"N. Urs, M. Mambelli, D. Dykstra","doi":"10.1145/3431379.3464451","DOIUrl":null,"url":null,"abstract":"High Energy Physics (HEP) experiments entail an abundance of computing resources, i.e. sites, to run simulations and analyses by processing data. This requirement is fulfilled by local batch farms, grid sites, private/commercial clouds, and supercomputing centers via High Throughput Computing (HTC). The growing needs of such experiments and resources being prone to trends of heterogeneity make it difficult for physicists to handle these resources directly. Additionally, HEP collaborations heavily rely on data and software releases, typically in the order of tens of gigabytes, while conducting simulations and analyses. Hence, aspects of scalability, reliability, and maintenance become crucial with regards to the distribution of the necessary data and software stack. The GlideinWMS [4] framework helps with the resource management problem by using pilot jobs, aka Glideins, to provision reliable elastic virtual clusters. Glideins are submitted to unreliable heterogeneous resources which are validated and customized by the Glideins to make the worker nodes available for end-user job execution. On the other hand, the CernVM File System (CernVM-FS or CVMFS) [1] helps with data distribution. It is a write-once, read-everywhere filesystem used to deploy scientific software to thousands of nodes on a worldwide distributed computing infrastructure. CVMFS is based on the Hyper Text Transfer Protocol and has been widely used within the particle physics community for (1) distributing experiment software and data such as calibrations, and (2) facilitating containerization by efficiently hosting container images along with providing containerization software, especially Singularity [3] GlideinWMS relies on CVMFS installed locally on the computing resources to satisfy the experiments' software needs. This requires system administrators' effort to install and maintain CVMFS at the sites and limits the use of sites, especially HPC resources, that do not have CVMFS installed. This poster presents a solution, taking advantage of Glideins to provide CVMFS at most sites without the need for a local installation. Doing so expands the pool of resources available for HEP experiments and reduces the effort of system administrators for current resources. Additionally, the proposed solution allows GlideinWMS to also start Singularity [3], a containerization software that can run unprivileged, on sites where neither CVMFS nor Singularity are available, including HPC sites. The benefits provided by this solution are: (1) lower overhead for site administrators in that they have less software to install, (2) an expanded pool of resources that run user jobs with easy access to software and data provided by CVMFS, thus making life easier for the scientists, and (3) improved flexibility to use HPC resources by enabling GlideinWMS pilot jobs to support HPC sites.","PeriodicalId":343991,"journal":{"name":"Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using Pilot Jobs and CernVM File System for Simplified Use of Containers and Software Distribution\",\"authors\":\"N. Urs, M. Mambelli, D. Dykstra\",\"doi\":\"10.1145/3431379.3464451\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High Energy Physics (HEP) experiments entail an abundance of computing resources, i.e. sites, to run simulations and analyses by processing data. This requirement is fulfilled by local batch farms, grid sites, private/commercial clouds, and supercomputing centers via High Throughput Computing (HTC). The growing needs of such experiments and resources being prone to trends of heterogeneity make it difficult for physicists to handle these resources directly. Additionally, HEP collaborations heavily rely on data and software releases, typically in the order of tens of gigabytes, while conducting simulations and analyses. Hence, aspects of scalability, reliability, and maintenance become crucial with regards to the distribution of the necessary data and software stack. The GlideinWMS [4] framework helps with the resource management problem by using pilot jobs, aka Glideins, to provision reliable elastic virtual clusters. Glideins are submitted to unreliable heterogeneous resources which are validated and customized by the Glideins to make the worker nodes available for end-user job execution. On the other hand, the CernVM File System (CernVM-FS or CVMFS) [1] helps with data distribution. It is a write-once, read-everywhere filesystem used to deploy scientific software to thousands of nodes on a worldwide distributed computing infrastructure. CVMFS is based on the Hyper Text Transfer Protocol and has been widely used within the particle physics community for (1) distributing experiment software and data such as calibrations, and (2) facilitating containerization by efficiently hosting container images along with providing containerization software, especially Singularity [3] GlideinWMS relies on CVMFS installed locally on the computing resources to satisfy the experiments' software needs. This requires system administrators' effort to install and maintain CVMFS at the sites and limits the use of sites, especially HPC resources, that do not have CVMFS installed. This poster presents a solution, taking advantage of Glideins to provide CVMFS at most sites without the need for a local installation. Doing so expands the pool of resources available for HEP experiments and reduces the effort of system administrators for current resources. Additionally, the proposed solution allows GlideinWMS to also start Singularity [3], a containerization software that can run unprivileged, on sites where neither CVMFS nor Singularity are available, including HPC sites. The benefits provided by this solution are: (1) lower overhead for site administrators in that they have less software to install, (2) an expanded pool of resources that run user jobs with easy access to software and data provided by CVMFS, thus making life easier for the scientists, and (3) improved flexibility to use HPC resources by enabling GlideinWMS pilot jobs to support HPC sites.\",\"PeriodicalId\":343991,\"journal\":{\"name\":\"Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3431379.3464451\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3431379.3464451","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

高能物理(HEP)实验需要大量的计算资源,即场地,通过处理数据来运行模拟和分析。本地批处理场、网格站点、私有/商业云和超级计算中心通过高吞吐量计算(High Throughput Computing, HTC)来满足这一需求。这类实验和资源的需求日益增长,且容易呈现异质性趋势,这使得物理学家很难直接处理这些资源。此外,HEP协作在进行模拟和分析时严重依赖于数据和软件发布,通常在数十gb的量级上。因此,可伸缩性、可靠性和维护方面对于必要数据和软件堆栈的分布变得至关重要。GlideinWMS[4]框架通过使用试点作业(即Glideins)来提供可靠的弹性虚拟集群,从而帮助解决资源管理问题。Glideins被提交给不可靠的异构资源,这些资源由Glideins进行验证和定制,以使工作节点可用于最终用户的作业执行。另一方面,CernVM文件系统(CernVM- fs或CVMFS)[1]有助于数据分发。它是一种一次性写入、随处读取的文件系统,用于将科学软件部署到全球分布式计算基础设施上的数千个节点上。CVMFS基于超文本传输协议(hypertext Transfer Protocol),在粒子物理学界被广泛用于(1)分发实验软件和校准等数据;(2)通过高效地托管容器映像以及提供容器化软件来促进容器化,特别是Singularity [3] GlideinWMS依靠本地安装在计算资源上的CVMFS来满足实验的软件需求。这需要系统管理员在站点上安装和维护CVMFS,并且限制了未安装CVMFS的站点(尤其是HPC资源)的使用。这张海报展示了一个解决方案,利用Glideins在大多数站点提供CVMFS,而无需本地安装。这样做可以扩展HEP实验可用的资源池,并减少系统管理员对当前资源的工作量。此外,提出的解决方案还允许GlideinWMS启动Singularity[3],这是一种可以在CVMFS和Singularity都不可用的站点(包括HPC站点)上无特权运行的容器化软件。该解决方案提供的好处是:(1)降低了站点管理员的开销,因为他们需要安装的软件更少;(2)扩展了运行用户作业的资源池,可以轻松访问CVMFS提供的软件和数据,从而使科学家的生活更轻松;(3)通过启用GlideinWMS试点作业来支持HPC站点,提高了使用HPC资源的灵活性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Using Pilot Jobs and CernVM File System for Simplified Use of Containers and Software Distribution
High Energy Physics (HEP) experiments entail an abundance of computing resources, i.e. sites, to run simulations and analyses by processing data. This requirement is fulfilled by local batch farms, grid sites, private/commercial clouds, and supercomputing centers via High Throughput Computing (HTC). The growing needs of such experiments and resources being prone to trends of heterogeneity make it difficult for physicists to handle these resources directly. Additionally, HEP collaborations heavily rely on data and software releases, typically in the order of tens of gigabytes, while conducting simulations and analyses. Hence, aspects of scalability, reliability, and maintenance become crucial with regards to the distribution of the necessary data and software stack. The GlideinWMS [4] framework helps with the resource management problem by using pilot jobs, aka Glideins, to provision reliable elastic virtual clusters. Glideins are submitted to unreliable heterogeneous resources which are validated and customized by the Glideins to make the worker nodes available for end-user job execution. On the other hand, the CernVM File System (CernVM-FS or CVMFS) [1] helps with data distribution. It is a write-once, read-everywhere filesystem used to deploy scientific software to thousands of nodes on a worldwide distributed computing infrastructure. CVMFS is based on the Hyper Text Transfer Protocol and has been widely used within the particle physics community for (1) distributing experiment software and data such as calibrations, and (2) facilitating containerization by efficiently hosting container images along with providing containerization software, especially Singularity [3] GlideinWMS relies on CVMFS installed locally on the computing resources to satisfy the experiments' software needs. This requires system administrators' effort to install and maintain CVMFS at the sites and limits the use of sites, especially HPC resources, that do not have CVMFS installed. This poster presents a solution, taking advantage of Glideins to provide CVMFS at most sites without the need for a local installation. Doing so expands the pool of resources available for HEP experiments and reduces the effort of system administrators for current resources. Additionally, the proposed solution allows GlideinWMS to also start Singularity [3], a containerization software that can run unprivileged, on sites where neither CVMFS nor Singularity are available, including HPC sites. The benefits provided by this solution are: (1) lower overhead for site administrators in that they have less software to install, (2) an expanded pool of resources that run user jobs with easy access to software and data provided by CVMFS, thus making life easier for the scientists, and (3) improved flexibility to use HPC resources by enabling GlideinWMS pilot jobs to support HPC sites.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信