Practice and Experience in using Parallel and Scalable Machine Learning with Heterogenous Modular Supercomputing Architectures

M. Riedel, Rocco Sedona, C. Barakat, Pétur Helgi Einarsson, R. Hassanian, Gabriele Cavallaro, Matthias Book, Helmut Neukirchen, A. Lintermann
{"title":"Practice and Experience in using Parallel and Scalable Machine Learning with Heterogenous Modular Supercomputing Architectures","authors":"M. Riedel, Rocco Sedona, C. Barakat, Pétur Helgi Einarsson, R. Hassanian, Gabriele Cavallaro, Matthias Book, Helmut Neukirchen, A. Lintermann","doi":"10.1109/IPDPSW52791.2021.00019","DOIUrl":null,"url":null,"abstract":"We observe a continuously increased use of Deep Learning (DL) as a specific type of Machine Learning (ML) for data-intensive problems (i.e., ’big data’) that requires powerful computing resources with equally increasing performance. Consequently, innovative heterogeneous High-Performance Computing (HPC) systems based on multi-core CPUs and many-core GPUs require an architectural design that addresses end user communities’ requirements that take advantage of ML and DL. Still the workloads of end user communities of the simulation sciences (e.g., using numerical methods based on known physical laws) needs to be equally supported in those architectures. This paper offers insights into the Modular Supercomputer Architecture (MSA) developed in the Dynamic Exascale Entry Platform (DEEP) series of projects to address the requirements of both simulation sciences and data-intensive sciences such as High Performance Data Analytics (HPDA). It shares insights into implementing the MSA in the Jülich Supercomputing Centre (JSC) hosting Europe No. 1 Supercomputer Jülich Wizard for European Leadership Science (JUWELS). We augment the technical findings with experience and lessons learned from two application communities case studies (i.e., remote sensing and health sciences) using the MSA with JUWELS and the DEEP systems in practice. Thus, the paper provides details into specific MSA design elements that enable significant performance improvements of ML and DL algorithms. While this paper focuses on MSA-based HPC systems and application experience, we are not losing sight of advances in Cloud Computing (CC) and Quantum Computing (QC) relevant for ML and DL.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW52791.2021.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

We observe a continuously increased use of Deep Learning (DL) as a specific type of Machine Learning (ML) for data-intensive problems (i.e., ’big data’) that requires powerful computing resources with equally increasing performance. Consequently, innovative heterogeneous High-Performance Computing (HPC) systems based on multi-core CPUs and many-core GPUs require an architectural design that addresses end user communities’ requirements that take advantage of ML and DL. Still the workloads of end user communities of the simulation sciences (e.g., using numerical methods based on known physical laws) needs to be equally supported in those architectures. This paper offers insights into the Modular Supercomputer Architecture (MSA) developed in the Dynamic Exascale Entry Platform (DEEP) series of projects to address the requirements of both simulation sciences and data-intensive sciences such as High Performance Data Analytics (HPDA). It shares insights into implementing the MSA in the Jülich Supercomputing Centre (JSC) hosting Europe No. 1 Supercomputer Jülich Wizard for European Leadership Science (JUWELS). We augment the technical findings with experience and lessons learned from two application communities case studies (i.e., remote sensing and health sciences) using the MSA with JUWELS and the DEEP systems in practice. Thus, the paper provides details into specific MSA design elements that enable significant performance improvements of ML and DL algorithms. While this paper focuses on MSA-based HPC systems and application experience, we are not losing sight of advances in Cloud Computing (CC) and Quantum Computing (QC) relevant for ML and DL.
在异构模块化超级计算架构中使用并行和可扩展机器学习的实践和经验
我们观察到深度学习(DL)作为一种特定类型的机器学习(ML)用于数据密集型问题(即“大数据”)的使用不断增加,这些问题需要强大的计算资源和同样提高的性能。因此,基于多核cpu和多核gpu的创新型异构高性能计算(HPC)系统需要一种架构设计,以满足终端用户群体利用ML和DL的需求。但是,模拟科学的最终用户社区的工作负载(例如,使用基于已知物理定律的数值方法)需要在这些体系结构中得到同样的支持。本文提供了在动态百亿亿级入口平台(DEEP)系列项目中开发的模块化超级计算机体系结构(MSA)的见解,以满足模拟科学和数据密集型科学(如高性能数据分析(HPDA))的需求。它分享了在j利希超级计算中心(JSC)实施MSA的见解,该中心托管欧洲第一超级计算机j利希巫师欧洲领导力科学(JUWELS)。我们从两个应用社区案例研究(即遥感和健康科学)中获得经验和教训,在实践中使用MSA与JUWELS和DEEP系统,从而增强了技术发现。因此,本文提供了具体的MSA设计元素的细节,这些元素能够显著提高ML和DL算法的性能。虽然本文的重点是基于msa的HPC系统和应用体验,但我们并没有忽视与ML和DL相关的云计算(CC)和量子计算(QC)的进步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信