Versatile software-defined HPC and cloud clusters on Alps supercomputer for diverse workflows

IF 2.5 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

International Journal of High Performance Computing Applications Pub Date : 2023-04-11 DOI:10.1177/10943420231167811

S. Alam, M. Gila, Mark Klein, Maxime Martinasso, T. Schulthess

{"title":"Versatile software-defined HPC and cloud clusters on Alps supercomputer for diverse workflows","authors":"S. Alam, M. Gila, Mark Klein, Maxime Martinasso, T. Schulthess","doi":"10.1177/10943420231167811","DOIUrl":null,"url":null,"abstract":"Supercomputers have been driving innovations for performance and scaling benefiting several scientific applications for the past few decades. Yet their ecosystems remain virtually unchanged when it comes to integrating distributed data-driven workflows, primarily due to rather rigid access methods and restricted configuration management options. X-as-a-Service model of cloud has introduced, among other features, a developer-centric DevOps approach empowering developers of infrastructure, platform to software artefacts, which, unfortunately contemporary supercomputers still lack. We introduce vClusters (versatile software-defined clusters), which is based on Infrastructure-as-code (IaC) technology. vClusters approach is a unique fusion of HPC and cloud technologies resulting in a software-defined, multi-tenant cluster on a supercomputing ecosystem, that, together with software-defined storage, enable DevOps for complex, data-driven workflows like grid middleware, alongside a classic HPC platform. IaC has been a commonplace in cloud computing, however, it lacked adoption within multi-Petascale ecosystems due to concerns related to performance and interoperability with classic HPC data centres’ ecosystems. We present an overview of the Swiss National Supercomputing Centre’s flagship Alps ecosystem as an implementation target for vClusters for HPC and data-driven workflows. Alps is based on the Cray-HPE Shasta EX supercomputing platform that includes an IaC compliant, microservices architecture (MSA) management system, which we leverage for demonstrating vClusters usage for our diverse operational workflows. We provide implementation details of two operational vClusters platforms: a classic HPC platform that is used predominantly by hundreds of users running thousands of large-scale numerical simulations batch jobs; and a widely used, data-intensive, Grid computing middleware platform used for CERN Worldwide LHC Computing Grid (WLCG) operations. The resulting solution showcases reuse and reduction of common configuration recipes across vCluster implementations, minimising operational change management overheads while introducing flexibility for managing artefacts for DevOps required by diverse workflows.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"37 1","pages":"288 - 305"},"PeriodicalIF":2.5000,"publicationDate":"2023-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of High Performance Computing Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1177/10943420231167811","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Supercomputers have been driving innovations for performance and scaling benefiting several scientific applications for the past few decades. Yet their ecosystems remain virtually unchanged when it comes to integrating distributed data-driven workflows, primarily due to rather rigid access methods and restricted configuration management options. X-as-a-Service model of cloud has introduced, among other features, a developer-centric DevOps approach empowering developers of infrastructure, platform to software artefacts, which, unfortunately contemporary supercomputers still lack. We introduce vClusters (versatile software-defined clusters), which is based on Infrastructure-as-code (IaC) technology. vClusters approach is a unique fusion of HPC and cloud technologies resulting in a software-defined, multi-tenant cluster on a supercomputing ecosystem, that, together with software-defined storage, enable DevOps for complex, data-driven workflows like grid middleware, alongside a classic HPC platform. IaC has been a commonplace in cloud computing, however, it lacked adoption within multi-Petascale ecosystems due to concerns related to performance and interoperability with classic HPC data centres’ ecosystems. We present an overview of the Swiss National Supercomputing Centre’s flagship Alps ecosystem as an implementation target for vClusters for HPC and data-driven workflows. Alps is based on the Cray-HPE Shasta EX supercomputing platform that includes an IaC compliant, microservices architecture (MSA) management system, which we leverage for demonstrating vClusters usage for our diverse operational workflows. We provide implementation details of two operational vClusters platforms: a classic HPC platform that is used predominantly by hundreds of users running thousands of large-scale numerical simulations batch jobs; and a widely used, data-intensive, Grid computing middleware platform used for CERN Worldwide LHC Computing Grid (WLCG) operations. The resulting solution showcases reuse and reduction of common configuration recipes across vCluster implementations, minimising operational change management overheads while introducing flexibility for managing artefacts for DevOps required by diverse workflows.

查看原文本刊更多论文

Alps超级计算机上的多功能软件定义HPC和云集群，用于不同的工作流程

在过去的几十年里，超级计算机一直在推动性能和规模的创新，使一些科学应用受益。然而，在集成分布式数据驱动的工作流时，他们的生态系统几乎没有变化，这主要是由于访问方法相当严格和配置管理选项有限。X-as-a-Service云模型引入了以开发者为中心的DevOps方法，为基础设施、平台到软件人工制品的开发者提供了能力，不幸的是，当代超级计算机仍然缺乏这种方法。我们介绍了vClusters（通用软件定义集群），它基于基础设施即代码（IaC）技术。vClusters方法是HPC和云技术的独特融合，在超级计算生态系统上形成了一个软件定义的多租户集群，与软件定义的存储一起，使DevOps能够实现复杂的数据驱动工作流，如网格中间件，以及经典的HPC平台。IaC在云计算中很常见，但由于担心性能和与传统HPC数据中心生态系统的互操作性，它在多Petascale生态系统中缺乏采用。我们概述了瑞士国家超级计算中心的旗舰阿尔卑斯生态系统，作为HPC和数据驱动工作流vClusters的实施目标。Alps基于Cray HPE Shasta EX超级计算平台，该平台包括一个符合IaC的微服务架构（MSA）管理系统，我们利用该系统来展示vClusters在我们多样化的运营工作流程中的使用情况。我们提供了两个可操作vClusters平台的实现细节：一个经典的HPC平台，主要由数百名运行数千个大规模数值模拟批处理作业的用户使用；以及一个广泛使用的、数据密集型的网格计算中间件平台，用于CERN全球LHC计算网格（WLCG）操作。由此产生的解决方案展示了vCluster实现中常见配置配方的重用和减少，最大限度地减少了运营更改管理开销，同时为管理不同工作流所需的DevOps工件引入了灵活性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of High Performance Computing Applications 工程技术-计算机：跨学科应用

CiteScore

6.10

自引率

6.50%

发文量

审稿时长

>12 weeks

期刊介绍： With ever increasing pressure for health services in all countries to meet rising demands, improve their quality and efficiency, and to be more accountable; the need for rigorous research and policy analysis has never been greater. The Journal of Health Services Research & Policy presents the latest scientific research, insightful overviews and reflections on underlying issues, and innovative, thought provoking contributions from leading academics and policy-makers. It provides ideas and hope for solving dilemmas that confront all countries.