Preprocessing Pipeline Optimization for Scientific Deep Learning Workloads

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2022-05-01 DOI:10.1109/ipdps53621.2022.00112

K. Ibrahim, L. Oliker

{"title":"Preprocessing Pipeline Optimization for Scientific Deep Learning Workloads","authors":"K. Ibrahim, L. Oliker","doi":"10.1109/ipdps53621.2022.00112","DOIUrl":null,"url":null,"abstract":"Newly developed machine learning technology is promising to profoundly impact high-performance computing, with the potential to significantly accelerate scientific discoveries. However, scientific machine learning performance is often constrained by data movement overheads, particularly on existing and emerging hardware-accelerated systems. In this work, we focus on optimizing the data movement across storage and memory systems, by developing domain-specific data encoder/decoders. These plugins have the dual benefit of significantly reducing communication while enabling efficient decoding on the accelerated hardware. We explore detailed performance analysis for two important scientific learning workloads from cosmology and climate analytics, CosmoFlow and DeepCAM, on the GPU-enabled Summit and Cori supercomputers. Results demonstrate that our optimizations can significantly improve overall performance by up to 10× compared with the default baseline, while preserving convergence behavior. Overall, this methodology can be applied to various machine learning domains and emerging AI technologies.","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"10 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ipdps53621.2022.00112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Newly developed machine learning technology is promising to profoundly impact high-performance computing, with the potential to significantly accelerate scientific discoveries. However, scientific machine learning performance is often constrained by data movement overheads, particularly on existing and emerging hardware-accelerated systems. In this work, we focus on optimizing the data movement across storage and memory systems, by developing domain-specific data encoder/decoders. These plugins have the dual benefit of significantly reducing communication while enabling efficient decoding on the accelerated hardware. We explore detailed performance analysis for two important scientific learning workloads from cosmology and climate analytics, CosmoFlow and DeepCAM, on the GPU-enabled Summit and Cori supercomputers. Results demonstrate that our optimizations can significantly improve overall performance by up to 10× compared with the default baseline, while preserving convergence behavior. Overall, this methodology can be applied to various machine learning domains and emerging AI technologies.

查看原文本刊更多论文

面向科学深度学习工作负载的预处理流水线优化

新开发的机器学习技术有望深刻影响高性能计算，并有可能显著加速科学发现。然而，科学机器学习的性能往往受到数据移动开销的限制，特别是在现有和新兴的硬件加速系统上。在这项工作中，我们通过开发特定领域的数据编码器/解码器，专注于优化跨存储和内存系统的数据移动。这些插件具有双重好处，即显著减少通信，同时在加速硬件上实现高效解码。我们在支持gpu的Summit和Cori超级计算机上探索了宇宙学和气候分析中两个重要的科学学习工作负载CosmoFlow和DeepCAM的详细性能分析。结果表明，与默认基线相比，我们的优化可以显着提高整体性能高达10倍，同时保持收敛行为。总的来说，这种方法可以应用于各种机器学习领域和新兴的人工智能技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量