Versatile software-defined HPC and cloud clusters on Alps supercomputer for diverse workflows

IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
S. Alam, M. Gila, Mark Klein, Maxime Martinasso, T. Schulthess
{"title":"Versatile software-defined HPC and cloud clusters on Alps supercomputer for diverse workflows","authors":"S. Alam, M. Gila, Mark Klein, Maxime Martinasso, T. Schulthess","doi":"10.1177/10943420231167811","DOIUrl":null,"url":null,"abstract":"Supercomputers have been driving innovations for performance and scaling benefiting several scientific applications for the past few decades. Yet their ecosystems remain virtually unchanged when it comes to integrating distributed data-driven workflows, primarily due to rather rigid access methods and restricted configuration management options. X-as-a-Service model of cloud has introduced, among other features, a developer-centric DevOps approach empowering developers of infrastructure, platform to software artefacts, which, unfortunately contemporary supercomputers still lack. We introduce vClusters (versatile software-defined clusters), which is based on Infrastructure-as-code (IaC) technology. vClusters approach is a unique fusion of HPC and cloud technologies resulting in a software-defined, multi-tenant cluster on a supercomputing ecosystem, that, together with software-defined storage, enable DevOps for complex, data-driven workflows like grid middleware, alongside a classic HPC platform. IaC has been a commonplace in cloud computing, however, it lacked adoption within multi-Petascale ecosystems due to concerns related to performance and interoperability with classic HPC data centres’ ecosystems. We present an overview of the Swiss National Supercomputing Centre’s flagship Alps ecosystem as an implementation target for vClusters for HPC and data-driven workflows. Alps is based on the Cray-HPE Shasta EX supercomputing platform that includes an IaC compliant, microservices architecture (MSA) management system, which we leverage for demonstrating vClusters usage for our diverse operational workflows. We provide implementation details of two operational vClusters platforms: a classic HPC platform that is used predominantly by hundreds of users running thousands of large-scale numerical simulations batch jobs; and a widely used, data-intensive, Grid computing middleware platform used for CERN Worldwide LHC Computing Grid (WLCG) operations. The resulting solution showcases reuse and reduction of common configuration recipes across vCluster implementations, minimising operational change management overheads while introducing flexibility for managing artefacts for DevOps required by diverse workflows.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"37 1","pages":"288 - 305"},"PeriodicalIF":3.5000,"publicationDate":"2023-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of High Performance Computing Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1177/10943420231167811","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Supercomputers have been driving innovations for performance and scaling benefiting several scientific applications for the past few decades. Yet their ecosystems remain virtually unchanged when it comes to integrating distributed data-driven workflows, primarily due to rather rigid access methods and restricted configuration management options. X-as-a-Service model of cloud has introduced, among other features, a developer-centric DevOps approach empowering developers of infrastructure, platform to software artefacts, which, unfortunately contemporary supercomputers still lack. We introduce vClusters (versatile software-defined clusters), which is based on Infrastructure-as-code (IaC) technology. vClusters approach is a unique fusion of HPC and cloud technologies resulting in a software-defined, multi-tenant cluster on a supercomputing ecosystem, that, together with software-defined storage, enable DevOps for complex, data-driven workflows like grid middleware, alongside a classic HPC platform. IaC has been a commonplace in cloud computing, however, it lacked adoption within multi-Petascale ecosystems due to concerns related to performance and interoperability with classic HPC data centres’ ecosystems. We present an overview of the Swiss National Supercomputing Centre’s flagship Alps ecosystem as an implementation target for vClusters for HPC and data-driven workflows. Alps is based on the Cray-HPE Shasta EX supercomputing platform that includes an IaC compliant, microservices architecture (MSA) management system, which we leverage for demonstrating vClusters usage for our diverse operational workflows. We provide implementation details of two operational vClusters platforms: a classic HPC platform that is used predominantly by hundreds of users running thousands of large-scale numerical simulations batch jobs; and a widely used, data-intensive, Grid computing middleware platform used for CERN Worldwide LHC Computing Grid (WLCG) operations. The resulting solution showcases reuse and reduction of common configuration recipes across vCluster implementations, minimising operational change management overheads while introducing flexibility for managing artefacts for DevOps required by diverse workflows.
Alps超级计算机上的多功能软件定义HPC和云集群,用于不同的工作流程
在过去的几十年里,超级计算机一直在推动性能和规模的创新,使一些科学应用受益。然而,在集成分布式数据驱动的工作流时,他们的生态系统几乎没有变化,这主要是由于访问方法相当严格和配置管理选项有限。X-as-a-Service云模型引入了以开发者为中心的DevOps方法,为基础设施、平台到软件人工制品的开发者提供了能力,不幸的是,当代超级计算机仍然缺乏这种方法。我们介绍了vClusters(通用软件定义集群),它基于基础设施即代码(IaC)技术。vClusters方法是HPC和云技术的独特融合,在超级计算生态系统上形成了一个软件定义的多租户集群,与软件定义的存储一起,使DevOps能够实现复杂的数据驱动工作流,如网格中间件,以及经典的HPC平台。IaC在云计算中很常见,但由于担心性能和与传统HPC数据中心生态系统的互操作性,它在多Petascale生态系统中缺乏采用。我们概述了瑞士国家超级计算中心的旗舰阿尔卑斯生态系统,作为HPC和数据驱动工作流vClusters的实施目标。Alps基于Cray HPE Shasta EX超级计算平台,该平台包括一个符合IaC的微服务架构(MSA)管理系统,我们利用该系统来展示vClusters在我们多样化的运营工作流程中的使用情况。我们提供了两个可操作vClusters平台的实现细节:一个经典的HPC平台,主要由数百名运行数千个大规模数值模拟批处理作业的用户使用;以及一个广泛使用的、数据密集型的网格计算中间件平台,用于CERN全球LHC计算网格(WLCG)操作。由此产生的解决方案展示了vCluster实现中常见配置配方的重用和减少,最大限度地减少了运营更改管理开销,同时为管理不同工作流所需的DevOps工件引入了灵活性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International Journal of High Performance Computing Applications
International Journal of High Performance Computing Applications 工程技术-计算机:跨学科应用
CiteScore
6.10
自引率
6.50%
发文量
32
审稿时长
>12 weeks
期刊介绍: With ever increasing pressure for health services in all countries to meet rising demands, improve their quality and efficiency, and to be more accountable; the need for rigorous research and policy analysis has never been greater. The Journal of Health Services Research & Policy presents the latest scientific research, insightful overviews and reflections on underlying issues, and innovative, thought provoking contributions from leading academics and policy-makers. It provides ideas and hope for solving dilemmas that confront all countries.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信