在EuroHPC系统上实现完整的AI模型生命周期管理,有《destiny》AIFS的经验

Thomas Geenen , Even Marius Nordhagen , Victor Sanchez , Cathal O'Brien , Simon Lang , Mihai Alexe , Ana Prieto Nemesio , Gert Mertes , Rakesh Prithiviraj , Jesper Dramsch , Baudouin Raoult , Florian Pinault , Helen Theissen , Sara Hahner , Mario Santa Cruz , Matthew Chantry , Nils Wedi
{"title":"在EuroHPC系统上实现完整的AI模型生命周期管理,有《destiny》AIFS的经验","authors":"Thomas Geenen ,&nbsp;Even Marius Nordhagen ,&nbsp;Victor Sanchez ,&nbsp;Cathal O'Brien ,&nbsp;Simon Lang ,&nbsp;Mihai Alexe ,&nbsp;Ana Prieto Nemesio ,&nbsp;Gert Mertes ,&nbsp;Rakesh Prithiviraj ,&nbsp;Jesper Dramsch ,&nbsp;Baudouin Raoult ,&nbsp;Florian Pinault ,&nbsp;Helen Theissen ,&nbsp;Sara Hahner ,&nbsp;Mario Santa Cruz ,&nbsp;Matthew Chantry ,&nbsp;Nils Wedi","doi":"10.1016/j.procs.2025.02.264","DOIUrl":null,"url":null,"abstract":"<div><div>On October 13 2023 ECMWF released the first alpha version of its artificial intelligence forecasting system, AIFS, ECMWFs data-driven forecasts model. This first release came just a few months after ECMWF started the development of this new model that highlights the increased efforts in the field of machine learning (ML) that ECMWF has been building over the last few years. This paper describes the use of AIFS on EuroHPC systems in the context of DestinE. The main focus is on performance benchmarks on the different EuroHPC systems available to DestinE but also very much on the deployment and use of the tools to support the model lifecycle management. EuroHPC systems have already proven to be of great value for DestinE and in this paper, we describe how we leverage these systems for artificial intelligence (AI) and ML models in DestinE. We are closely working with EuroHPC and EuroHPC hosting sites through co-design and the optimization of existing solutions to optimize the usage of these systems in every step of the lifecycle management for AI and ML models. The performance benchmarks of our models on several EuroHPC systems showed that the speedup is close to linear up to several thousand GPUs, but that for each EuroHPC system a different optimization strategy must be used to achieve that. For model lifecycle management we found that we can use our in-house developed, domain specific, framework on EuroHPC systems and highlight some specific modifications and future improvements for EuroHPC systems. W e a l s o provide implementation details and share our experiences on how to retrieve and collect provenance data and information from models running on EuroHPC systems using (external to the EuroHPC system deployed) cloud native frameworks. Although we describe solutions in this paper that are designed to support our specific requirements and context, we believe that proposed solutions, developments and implementation details can also bring value beyond the broader NWP community.</div></div>","PeriodicalId":20465,"journal":{"name":"Procedia Computer Science","volume":"255 ","pages":"Pages 93-102"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards full AI model lifecycle management on EuroHPC systems, experiences with AIFS for DestinE\",\"authors\":\"Thomas Geenen ,&nbsp;Even Marius Nordhagen ,&nbsp;Victor Sanchez ,&nbsp;Cathal O'Brien ,&nbsp;Simon Lang ,&nbsp;Mihai Alexe ,&nbsp;Ana Prieto Nemesio ,&nbsp;Gert Mertes ,&nbsp;Rakesh Prithiviraj ,&nbsp;Jesper Dramsch ,&nbsp;Baudouin Raoult ,&nbsp;Florian Pinault ,&nbsp;Helen Theissen ,&nbsp;Sara Hahner ,&nbsp;Mario Santa Cruz ,&nbsp;Matthew Chantry ,&nbsp;Nils Wedi\",\"doi\":\"10.1016/j.procs.2025.02.264\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>On October 13 2023 ECMWF released the first alpha version of its artificial intelligence forecasting system, AIFS, ECMWFs data-driven forecasts model. This first release came just a few months after ECMWF started the development of this new model that highlights the increased efforts in the field of machine learning (ML) that ECMWF has been building over the last few years. This paper describes the use of AIFS on EuroHPC systems in the context of DestinE. The main focus is on performance benchmarks on the different EuroHPC systems available to DestinE but also very much on the deployment and use of the tools to support the model lifecycle management. EuroHPC systems have already proven to be of great value for DestinE and in this paper, we describe how we leverage these systems for artificial intelligence (AI) and ML models in DestinE. We are closely working with EuroHPC and EuroHPC hosting sites through co-design and the optimization of existing solutions to optimize the usage of these systems in every step of the lifecycle management for AI and ML models. The performance benchmarks of our models on several EuroHPC systems showed that the speedup is close to linear up to several thousand GPUs, but that for each EuroHPC system a different optimization strategy must be used to achieve that. For model lifecycle management we found that we can use our in-house developed, domain specific, framework on EuroHPC systems and highlight some specific modifications and future improvements for EuroHPC systems. W e a l s o provide implementation details and share our experiences on how to retrieve and collect provenance data and information from models running on EuroHPC systems using (external to the EuroHPC system deployed) cloud native frameworks. Although we describe solutions in this paper that are designed to support our specific requirements and context, we believe that proposed solutions, developments and implementation details can also bring value beyond the broader NWP community.</div></div>\",\"PeriodicalId\":20465,\"journal\":{\"name\":\"Procedia Computer Science\",\"volume\":\"255 \",\"pages\":\"Pages 93-102\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Procedia Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1877050925006258\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Procedia Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1877050925006258","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

2023年10月13日,ECMWF发布了其人工智能预测系统AIFS的第一个alpha版本,即ECMWF数据驱动的预测模型。第一个版本是在ECMWF开始开发这个新模型的几个月后发布的,这个新模型突出了ECMWF在过去几年中在机器学习(ML)领域不断增加的努力。本文描述了AIFS在destiny环境下在EuroHPC系统上的应用。主要关注的是不同EuroHPC系统的性能基准测试,但也非常关注支持模型生命周期管理的工具的部署和使用。EuroHPC系统已经被证明对destiny具有巨大的价值,在本文中,我们描述了如何利用这些系统在destiny中实现人工智能(AI)和ML模型。我们正在与EuroHPC和EuroHPC托管站点密切合作,通过共同设计和优化现有解决方案,优化这些系统在人工智能和机器学习模型生命周期管理的每一步中的使用。我们的模型在几个EuroHPC系统上的性能基准测试表明,在几千个gpu的情况下,加速接近线性,但对于每个EuroHPC系统,必须使用不同的优化策略来实现这一目标。对于模型生命周期管理,我们发现我们可以在EuroHPC系统上使用我们内部开发的、特定领域的框架,并强调EuroHPC系统的一些具体修改和未来改进。我们将提供实现细节,并分享我们关于如何使用(部署在EuroHPC系统外部的)云原生框架从运行在EuroHPC系统上的模型中检索和收集来源数据和信息的经验。虽然我们在本文中描述的解决方案旨在支持我们的特定需求和背景,但我们相信,提出的解决方案、开发和实施细节也可以为更广泛的NWP社区带来价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Towards full AI model lifecycle management on EuroHPC systems, experiences with AIFS for DestinE
On October 13 2023 ECMWF released the first alpha version of its artificial intelligence forecasting system, AIFS, ECMWFs data-driven forecasts model. This first release came just a few months after ECMWF started the development of this new model that highlights the increased efforts in the field of machine learning (ML) that ECMWF has been building over the last few years. This paper describes the use of AIFS on EuroHPC systems in the context of DestinE. The main focus is on performance benchmarks on the different EuroHPC systems available to DestinE but also very much on the deployment and use of the tools to support the model lifecycle management. EuroHPC systems have already proven to be of great value for DestinE and in this paper, we describe how we leverage these systems for artificial intelligence (AI) and ML models in DestinE. We are closely working with EuroHPC and EuroHPC hosting sites through co-design and the optimization of existing solutions to optimize the usage of these systems in every step of the lifecycle management for AI and ML models. The performance benchmarks of our models on several EuroHPC systems showed that the speedup is close to linear up to several thousand GPUs, but that for each EuroHPC system a different optimization strategy must be used to achieve that. For model lifecycle management we found that we can use our in-house developed, domain specific, framework on EuroHPC systems and highlight some specific modifications and future improvements for EuroHPC systems. W e a l s o provide implementation details and share our experiences on how to retrieve and collect provenance data and information from models running on EuroHPC systems using (external to the EuroHPC system deployed) cloud native frameworks. Although we describe solutions in this paper that are designed to support our specific requirements and context, we believe that proposed solutions, developments and implementation details can also bring value beyond the broader NWP community.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
4.50
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信