DRLPart:一种用于商品服务器上最高效和鲁棒资源划分的深度强化学习框架

Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing Pub Date : 2020-06-21 DOI:10.1145/3431379.3460648

Ruobing Chen, Jinping Wu, Haosen Shi, Yusen Li, Xiaoguang Liu, Gang Wang

{"title":"DRLPart:一种用于商品服务器上最高效和鲁棒资源划分的深度强化学习框架","authors":"Ruobing Chen, Jinping Wu, Haosen Shi, Yusen Li, Xiaoguang Liu, Gang Wang","doi":"10.1145/3431379.3460648","DOIUrl":null,"url":null,"abstract":"Workload consolidation is a commonly used approach for improving resource utilization of commodity servers. However, colocated workloads often suffer from significant performance degradations due to resource contention, which makes resource partitioning an important research problem. Partitioning multiple resources coordinately is particularly challenging due to the complex contention behaviors and huge solution space, which is not well-addressed in the literature. In this paper, we propose a deep reinforcement learning (DRL) framework, named DRLPart, for solving the problem of partitioning multiple resources coordinately. DRLPart learns the optimal partitioning decision from easy-to-collect real-time system state, without need of domain knowledge and handcrafted search heuristics. We solve two critical challenges of applying DRL to the resource partitioning problem. First, we build a deep-learning based performance model, which significantly reduces the training overhead, by estimating the rewards of actions without interacting with real system. Second, we propose a fine-tuning process to improve bad decisions occasionally made by the DRL model, which enhances the adaptivity to new situations. Results from extensive evaluations show that the proposed framework is optimally efficient and robust, which improves the system throughput by 13.3%~18.5 compared to the state-of-the-art baselines.","PeriodicalId":343991,"journal":{"name":"Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"DRLPart: A Deep Reinforcement Learning Framework for Optimally Efficient and Robust Resource Partitioning on Commodity Servers\",\"authors\":\"Ruobing Chen, Jinping Wu, Haosen Shi, Yusen Li, Xiaoguang Liu, Gang Wang\",\"doi\":\"10.1145/3431379.3460648\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Workload consolidation is a commonly used approach for improving resource utilization of commodity servers. However, colocated workloads often suffer from significant performance degradations due to resource contention, which makes resource partitioning an important research problem. Partitioning multiple resources coordinately is particularly challenging due to the complex contention behaviors and huge solution space, which is not well-addressed in the literature. In this paper, we propose a deep reinforcement learning (DRL) framework, named DRLPart, for solving the problem of partitioning multiple resources coordinately. DRLPart learns the optimal partitioning decision from easy-to-collect real-time system state, without need of domain knowledge and handcrafted search heuristics. We solve two critical challenges of applying DRL to the resource partitioning problem. First, we build a deep-learning based performance model, which significantly reduces the training overhead, by estimating the rewards of actions without interacting with real system. Second, we propose a fine-tuning process to improve bad decisions occasionally made by the DRL model, which enhances the adaptivity to new situations. Results from extensive evaluations show that the proposed framework is optimally efficient and robust, which improves the system throughput by 13.3%~18.5 compared to the state-of-the-art baselines.\",\"PeriodicalId\":343991,\"journal\":{\"name\":\"Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3431379.3460648\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3431379.3460648","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

工作负载整合是提高商品服务器资源利用率的常用方法。然而，由于资源争用，并发工作负载往往会导致性能显著下降，这使得资源分区成为一个重要的研究问题。由于争用行为复杂，求解空间巨大，对多个资源进行协调划分尤其具有挑战性，这在文献中没有得到很好的解决。在本文中，我们提出了一个深度强化学习(DRL)框架，命名为DRLPart，以解决多个资源的协调划分问题。DRLPart从易于收集的实时系统状态中学习最优分区决策，不需要领域知识和手工搜索启发式。我们解决了将DRL应用于资源分区问题的两个关键挑战。首先，我们建立了一个基于深度学习的性能模型，通过在不与真实系统交互的情况下估计动作的奖励，显著降低了训练开销。其次，我们提出了一个微调过程来改善DRL模型偶尔做出的错误决策，增强了对新情况的适应能力。广泛的评估结果表明，所提出的框架具有最佳的效率和鲁棒性，与最先进的基线相比，系统吞吐量提高了13.3%~ 18.5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DRLPart: A Deep Reinforcement Learning Framework for Optimally Efficient and Robust Resource Partitioning on Commodity Servers

Workload consolidation is a commonly used approach for improving resource utilization of commodity servers. However, colocated workloads often suffer from significant performance degradations due to resource contention, which makes resource partitioning an important research problem. Partitioning multiple resources coordinately is particularly challenging due to the complex contention behaviors and huge solution space, which is not well-addressed in the literature. In this paper, we propose a deep reinforcement learning (DRL) framework, named DRLPart, for solving the problem of partitioning multiple resources coordinately. DRLPart learns the optimal partitioning decision from easy-to-collect real-time system state, without need of domain knowledge and handcrafted search heuristics. We solve two critical challenges of applying DRL to the resource partitioning problem. First, we build a deep-learning based performance model, which significantly reduces the training overhead, by estimating the rewards of actions without interacting with real system. Second, we propose a fine-tuning process to improve bad decisions occasionally made by the DRL model, which enhances the adaptivity to new situations. Results from extensive evaluations show that the proposed framework is optimally efficient and robust, which improves the system throughput by 13.3%~18.5 compared to the state-of-the-art baselines.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing

自引率

0.00%

发文量