{"title":"分布式深度神经网络训练中调度解耦全约简原语的高效通信框架","authors":"Yunqi Gao;Bing Hu;Mahdi Boloursaz Mashhadi;Wei Wang;Rahim Tafazolli;Mérouane Debbah","doi":"10.1109/TETC.2025.3573522","DOIUrl":null,"url":null,"abstract":"Communication scheduling effectively improves the scalability of distributed deep learning by overlapping computation and communication tasks during training. However, existing communication scheduling frameworks based on tensor partitioning suffer from two fundamental issues: (1) partitioning schemes at the data volume level introduce extensive startup overheads leading to higher energy consumption, and (2) partitioning schemes at the communication primitive level do not provide optimal scheduling resulting in longer training time. In this article, we propose an efficient communication mechanism, namely PipeDAP, which schedules decoupled all-reduce operations in a near-optimal order to minimize the time and energy consumption of training DNN models. We build the mathematical model for PipeDAP and derive the near-optimal scheduling order of the reduce-scatter and all-gather operations. Meanwhile, we leverage simultaneous communication of reduce-scatter and all-gather operations to further reduce the startup overheads. We implement the PipeDAP architecture on PyTorch framework, and apply it for distributed training of benchmark DNN models. Experimental results on two GPU clusters demonstrate that PipeDAP achieves up to 1.82x speedup and saves up to 45.4% of energy consumption compared to the state-of-the-art communication scheduling frameworks.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"1170-1184"},"PeriodicalIF":5.4000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PipeDAP: An Efficient Communication Framework for Scheduling Decoupled All-Reduce Primitives in Distributed DNN Training\",\"authors\":\"Yunqi Gao;Bing Hu;Mahdi Boloursaz Mashhadi;Wei Wang;Rahim Tafazolli;Mérouane Debbah\",\"doi\":\"10.1109/TETC.2025.3573522\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Communication scheduling effectively improves the scalability of distributed deep learning by overlapping computation and communication tasks during training. However, existing communication scheduling frameworks based on tensor partitioning suffer from two fundamental issues: (1) partitioning schemes at the data volume level introduce extensive startup overheads leading to higher energy consumption, and (2) partitioning schemes at the communication primitive level do not provide optimal scheduling resulting in longer training time. In this article, we propose an efficient communication mechanism, namely PipeDAP, which schedules decoupled all-reduce operations in a near-optimal order to minimize the time and energy consumption of training DNN models. We build the mathematical model for PipeDAP and derive the near-optimal scheduling order of the reduce-scatter and all-gather operations. Meanwhile, we leverage simultaneous communication of reduce-scatter and all-gather operations to further reduce the startup overheads. We implement the PipeDAP architecture on PyTorch framework, and apply it for distributed training of benchmark DNN models. Experimental results on two GPU clusters demonstrate that PipeDAP achieves up to 1.82x speedup and saves up to 45.4% of energy consumption compared to the state-of-the-art communication scheduling frameworks.\",\"PeriodicalId\":13156,\"journal\":{\"name\":\"IEEE Transactions on Emerging Topics in Computing\",\"volume\":\"13 3\",\"pages\":\"1170-1184\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2025-06-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Emerging Topics in Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11021340/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11021340/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
PipeDAP: An Efficient Communication Framework for Scheduling Decoupled All-Reduce Primitives in Distributed DNN Training
Communication scheduling effectively improves the scalability of distributed deep learning by overlapping computation and communication tasks during training. However, existing communication scheduling frameworks based on tensor partitioning suffer from two fundamental issues: (1) partitioning schemes at the data volume level introduce extensive startup overheads leading to higher energy consumption, and (2) partitioning schemes at the communication primitive level do not provide optimal scheduling resulting in longer training time. In this article, we propose an efficient communication mechanism, namely PipeDAP, which schedules decoupled all-reduce operations in a near-optimal order to minimize the time and energy consumption of training DNN models. We build the mathematical model for PipeDAP and derive the near-optimal scheduling order of the reduce-scatter and all-gather operations. Meanwhile, we leverage simultaneous communication of reduce-scatter and all-gather operations to further reduce the startup overheads. We implement the PipeDAP architecture on PyTorch framework, and apply it for distributed training of benchmark DNN models. Experimental results on two GPU clusters demonstrate that PipeDAP achieves up to 1.82x speedup and saves up to 45.4% of energy consumption compared to the state-of-the-art communication scheduling frameworks.
期刊介绍:
IEEE Transactions on Emerging Topics in Computing publishes papers on emerging aspects of computer science, computing technology, and computing applications not currently covered by other IEEE Computer Society Transactions. Some examples of emerging topics in computing include: IT for Green, Synthetic and organic computing structures and systems, Advanced analytics, Social/occupational computing, Location-based/client computer systems, Morphic computer design, Electronic game systems, & Health-care IT.