Qiu Zhen, Fan Xu, Wenpu Li, Fan Yang, Hongyu Wu, Huanhuan Li
{"title":"Research on distributed heterogeneous task scheduling and resource allocation algorithms based on deep learning","authors":"Qiu Zhen, Fan Xu, Wenpu Li, Fan Yang, Hongyu Wu, Huanhuan Li","doi":"10.1117/12.3032073","DOIUrl":null,"url":null,"abstract":"With the rapid development and application of deep learning, its dataset size and network model are becoming increasingly large, and distributed model training is becoming increasingly popular. This article proposes a distributed heterogeneous task scheduling and resource allocation algorithm based on deep learning to address issues such as heterogeneity in resource usage, inability to predict task convergence time, communication time bottlenecks, and resource waste caused by static resource allocation during distributed collaborative training. This algorithm achieves dynamic scheduling and resource allocation of heterogeneous tasks and reduces task completion time in clusters. The experiment shows that the algorithm proposed in this article has significant improvements in both task completion time and system duration.","PeriodicalId":198425,"journal":{"name":"Other Conferences","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Other Conferences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.3032073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the rapid development and application of deep learning, its dataset size and network model are becoming increasingly large, and distributed model training is becoming increasingly popular. This article proposes a distributed heterogeneous task scheduling and resource allocation algorithm based on deep learning to address issues such as heterogeneity in resource usage, inability to predict task convergence time, communication time bottlenecks, and resource waste caused by static resource allocation during distributed collaborative training. This algorithm achieves dynamic scheduling and resource allocation of heterogeneous tasks and reduces task completion time in clusters. The experiment shows that the algorithm proposed in this article has significant improvements in both task completion time and system duration.