Cheng-Ying Wu, Qi Zhao, Cheng-Yu Cheng, Yuchen Yang, Muhammad Qureshi, Hang Liu, Genshe Chen
{"title":"Machine learning-based real-time task scheduling for Apache Storm","authors":"Cheng-Ying Wu, Qi Zhao, Cheng-Yu Cheng, Yuchen Yang, Muhammad Qureshi, Hang Liu, Genshe Chen","doi":"10.1117/12.3021842","DOIUrl":null,"url":null,"abstract":"Apache Storm is a popular open-source distributed computing platform for real-time big-data processing. However, the existing task scheduling algorithms for Apache Storm do not adequately take into account the heterogeneity and dynamics of node computing resources and task demands, leading to high processing latency and suboptimal performance. In this thesis, we propose an innovative machine learning-based task scheduling scheme tailored for Apache Storm. The scheme leverages machine learning models to predict task performance and assigns a task to the computation node with the lowest predicted processing latency. In our design, each node operates a machine learning-based monitoring mechanism. When the master node schedules a new task, it queries the computation nodes obtains their available resources, and processes latency predictions to make the optimal assignment decision. We explored three machine learning models, including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN), and Deep Belief Networks (DBN). Our experiments showed that LSTM achieved the most accurate latency predictions. The evaluation results demonstrate that Apache Storm with the proposed LSTM-based scheduling scheme significantly improves the task processing delay and resource utilization, compared to the existing algorithms.","PeriodicalId":178341,"journal":{"name":"Defense + Commercial Sensing","volume":"110 8","pages":"130620I - 130620I-9"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Defense + Commercial Sensing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.3021842","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Apache Storm is a popular open-source distributed computing platform for real-time big-data processing. However, the existing task scheduling algorithms for Apache Storm do not adequately take into account the heterogeneity and dynamics of node computing resources and task demands, leading to high processing latency and suboptimal performance. In this thesis, we propose an innovative machine learning-based task scheduling scheme tailored for Apache Storm. The scheme leverages machine learning models to predict task performance and assigns a task to the computation node with the lowest predicted processing latency. In our design, each node operates a machine learning-based monitoring mechanism. When the master node schedules a new task, it queries the computation nodes obtains their available resources, and processes latency predictions to make the optimal assignment decision. We explored three machine learning models, including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN), and Deep Belief Networks (DBN). Our experiments showed that LSTM achieved the most accurate latency predictions. The evaluation results demonstrate that Apache Storm with the proposed LSTM-based scheduling scheme significantly improves the task processing delay and resource utilization, compared to the existing algorithms.