ScaMP:深度学习搜索的可伸缩元并行

2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW) Pub Date : 2023-05-01 DOI:10.1109/CCGridW59191.2023.00080

Quentin G. Anthony, Lang Xu, A. Shafi, H. Subramoni, Dhabaleswar K. Panda

{"title":"ScaMP:深度学习搜索的可伸缩元并行","authors":"Quentin G. Anthony, Lang Xu, A. Shafi, H. Subramoni, Dhabaleswar K. Panda","doi":"10.1109/CCGridW59191.2023.00080","DOIUrl":null,"url":null,"abstract":"In this paper, we propose Scalable Meta-Parallelism for Deep Learning Search (ScaMP): a distributed Hyperparameter Optimization (HPO) and Neural Architecture Search (NAS) framework that supports out-of-core models with flexible parallelism schemes. SCaMP is integrated into the modern DL ecosystem, and enables both efficient parallel training of concurrent candidate architectures and aggregate device memory saturation via a powerful load balancing engine. SCaMP estimates the memory requirements of each candidate architecture and automatically applies the appropriate model-parallel degree and maximum batch size supported for the given candidate.We evaluate the benefits of our designs on synthetic training benchmarks and in training a state-of-the-art vision transformer model. We select transformers as a candidate DL model type and demonstrate a 29% improvement in end-to-end HPO time on 32 V100 GPUs on the Lassen and ThetaGPU HPC systems. Further, we demonstrate a reduction in the proportion of NAS time spent in communication from 28% to 15%. Finally, we thoroughly verify the correctness of SCaMP by training a state-of-the-art SwinIR model.","PeriodicalId":341115,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ScaMP: Scalable Meta-Parallelism for Deep Learning Search\",\"authors\":\"Quentin G. Anthony, Lang Xu, A. Shafi, H. Subramoni, Dhabaleswar K. Panda\",\"doi\":\"10.1109/CCGridW59191.2023.00080\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose Scalable Meta-Parallelism for Deep Learning Search (ScaMP): a distributed Hyperparameter Optimization (HPO) and Neural Architecture Search (NAS) framework that supports out-of-core models with flexible parallelism schemes. SCaMP is integrated into the modern DL ecosystem, and enables both efficient parallel training of concurrent candidate architectures and aggregate device memory saturation via a powerful load balancing engine. SCaMP estimates the memory requirements of each candidate architecture and automatically applies the appropriate model-parallel degree and maximum batch size supported for the given candidate.We evaluate the benefits of our designs on synthetic training benchmarks and in training a state-of-the-art vision transformer model. We select transformers as a candidate DL model type and demonstrate a 29% improvement in end-to-end HPO time on 32 V100 GPUs on the Lassen and ThetaGPU HPC systems. Further, we demonstrate a reduction in the proportion of NAS time spent in communication from 28% to 15%. Finally, we thoroughly verify the correctness of SCaMP by training a state-of-the-art SwinIR model.\",\"PeriodicalId\":341115,\"journal\":{\"name\":\"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW)\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGridW59191.2023.00080\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGridW59191.2023.00080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在本文中，我们提出了深度学习搜索的可扩展元并行(Scalable Meta-Parallelism, ScaMP):一个分布式超参数优化(HPO)和神经结构搜索(NAS)框架，支持具有灵活并行方案的核外模型。SCaMP集成到现代DL生态系统中，并通过强大的负载平衡引擎实现并发候选架构的高效并行训练和聚合设备内存饱和。SCaMP估计每个候选体系结构的内存需求，并自动应用适当的模型并行度和给定候选体系结构支持的最大批大小。我们评估了我们的设计在综合训练基准和训练最先进的视觉变压器模型方面的好处。我们选择变压器作为候选DL模型类型，并在Lassen和ThetaGPU HPC系统上的32个V100 gpu上证明了端到端HPO时间提高了29%。此外，我们证明了NAS在通信中花费的时间比例从28%减少到15%。最后，我们通过训练最先进的SwinIR模型来彻底验证SCaMP的正确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ScaMP: Scalable Meta-Parallelism for Deep Learning Search

In this paper, we propose Scalable Meta-Parallelism for Deep Learning Search (ScaMP): a distributed Hyperparameter Optimization (HPO) and Neural Architecture Search (NAS) framework that supports out-of-core models with flexible parallelism schemes. SCaMP is integrated into the modern DL ecosystem, and enables both efficient parallel training of concurrent candidate architectures and aggregate device memory saturation via a powerful load balancing engine. SCaMP estimates the memory requirements of each candidate architecture and automatically applies the appropriate model-parallel degree and maximum batch size supported for the given candidate.We evaluate the benefits of our designs on synthetic training benchmarks and in training a state-of-the-art vision transformer model. We select transformers as a candidate DL model type and demonstrate a 29% improvement in end-to-end HPO time on 32 V100 GPUs on the Lassen and ThetaGPU HPC systems. Further, we demonstrate a reduction in the proportion of NAS time spent in communication from 28% to 15%. Finally, we thoroughly verify the correctness of SCaMP by training a state-of-the-art SwinIR model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW)

自引率

0.00%

发文量