Leveraging Domain Knowledge for Robust Deep Reinforcement Learning in Networking

IEEE INFOCOM 2021 - IEEE Conference on Computer Communications Pub Date : 2021-05-10 DOI:10.1109/INFOCOM42981.2021.9488863

Ying Zheng, Haoyu Chen, Qingyang Duan, Lixiang Lin, Yiyang Shao, Wei Wang, Xin Wang, Yuedong Xu

{"title":"Leveraging Domain Knowledge for Robust Deep Reinforcement Learning in Networking","authors":"Ying Zheng, Haoyu Chen, Qingyang Duan, Lixiang Lin, Yiyang Shao, Wei Wang, Xin Wang, Yuedong Xu","doi":"10.1109/INFOCOM42981.2021.9488863","DOIUrl":null,"url":null,"abstract":"The past few years has witnessed a surge of interest towards deep reinforcement learning (Deep RL) in computer networks. With extraordinary ability of feature extraction, Deep RL has the potential to re-engineer the fundamental resource allocation problems in networking without relying on pre-programmed models or assumptions about dynamic environments. However, such black-box systems suffer from poor robustness, showing high performance variance and poor tail performance. In this work, we propose a unified Teacher-Student learning framework that harnesses rich domain knowledge to improve robustness. The domain-specific algorithms, less performant but more trustable than Deep RL, play the role of teachers providing advice at critical states; the student neural network is steered to maximize the expected reward as usual and mimic the teacher’s advice meanwhile. The Teacher-Student method comprises of three modules where the confidence check module locates wrong decisions and risky decisions, the reward shaping module designs a new updating function to incentive the learning of student network, and the prioritized experience replay module to effectively utilize the advised actions. We further implement our Teacher-Student framework in existing video streaming (Pensieve), load balancing (DeepLB) and TCP congestion control (Aurora). Experimental results manifest that the proposed approach reduces the performance standard deviation of DeepLB by 37%; it improves the 90th, 95th and 99th tail performance of Pensieve by 7.6%, 8.8%, 10.7% respectively; and it accelerates the rate of growth of Aurora by 2x at the initial stage, and achieves a more stable performance in dynamic environments.","PeriodicalId":293079,"journal":{"name":"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOCOM42981.2021.9488863","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

The past few years has witnessed a surge of interest towards deep reinforcement learning (Deep RL) in computer networks. With extraordinary ability of feature extraction, Deep RL has the potential to re-engineer the fundamental resource allocation problems in networking without relying on pre-programmed models or assumptions about dynamic environments. However, such black-box systems suffer from poor robustness, showing high performance variance and poor tail performance. In this work, we propose a unified Teacher-Student learning framework that harnesses rich domain knowledge to improve robustness. The domain-specific algorithms, less performant but more trustable than Deep RL, play the role of teachers providing advice at critical states; the student neural network is steered to maximize the expected reward as usual and mimic the teacher’s advice meanwhile. The Teacher-Student method comprises of three modules where the confidence check module locates wrong decisions and risky decisions, the reward shaping module designs a new updating function to incentive the learning of student network, and the prioritized experience replay module to effectively utilize the advised actions. We further implement our Teacher-Student framework in existing video streaming (Pensieve), load balancing (DeepLB) and TCP congestion control (Aurora). Experimental results manifest that the proposed approach reduces the performance standard deviation of DeepLB by 37%; it improves the 90th, 95th and 99th tail performance of Pensieve by 7.6%, 8.8%, 10.7% respectively; and it accelerates the rate of growth of Aurora by 2x at the initial stage, and achieves a more stable performance in dynamic environments.

查看原文本刊更多论文

利用领域知识实现网络中的鲁棒深度强化学习

过去几年，人们对计算机网络中的深度强化学习(deep RL)产生了浓厚的兴趣。由于具有非凡的特征提取能力，深度强化学习有可能重新设计网络中的基本资源分配问题，而不依赖于预编程模型或对动态环境的假设。然而，这种黑盒系统的鲁棒性较差，表现出较高的性能方差和较差的尾部性能。在这项工作中，我们提出了一个统一的师生学习框架，利用丰富的领域知识来提高鲁棒性。特定领域的算法，性能不如深度强化学习，但更值得信赖，在关键状态下扮演教师的角色，提供建议;学生的神经网络被引导到像往常一样最大化预期奖励，同时模仿老师的建议。师生方法包括三个模块，其中信心检查模块定位错误决策和风险决策，奖励塑造模块设计新的更新功能来激励学生网络的学习，优先体验重播模块有效利用建议行为。我们进一步在现有的视频流(Pensieve)，负载平衡(DeepLB)和TCP拥塞控制(Aurora)中实现我们的师生框架。实验结果表明，该方法将DeepLB的性能标准差降低了37%;使冥想子第90、95、99尾性能分别提高7.6%、8.8%、10.7%;使Aurora在初始阶段的生长速度加快了2倍，在动态环境中表现更加稳定。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE INFOCOM 2021 - IEEE Conference on Computer Communications

自引率

0.00%

发文量