Self-play reinforcement learning for video transmission

Proceedings of the 30th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video Pub Date : 2020-05-26 DOI:10.1145/3386290.3396930

Tianchi Huang, Ruixiao Zhang, Lifeng Sun

{"title":"Self-play reinforcement learning for video transmission","authors":"Tianchi Huang, Ruixiao Zhang, Lifeng Sun","doi":"10.1145/3386290.3396930","DOIUrl":null,"url":null,"abstract":"Video transmission services adopt adaptive algorithms to ensure users' demands. Existing techniques are often optimized and evaluated by a function that linearly combines several weighted metrics. Nevertheless, we observe that the given function fails to describe the requirement accurately. Thus, such proposed methods might eventually violate the original needs. To eliminate this concern, we propose Zwei, a self-play reinforcement learning algorithm for video transmission tasks. Zwei aims to update the policy by straightforwardly utilizing the actual requirement. Technically, Zwei samples a number of trajectories from the same starting point, and instantly estimates the win rate w.r.t the competition outcome. Here the competition result represents which trajectory is closer to the assigned requirement. Subsequently, Zwei optimizes the strategy by maximizing the win rate. To build Zwei, we develop simulation environments, design adequate neural network models, and invent training methods for dealing with different requirements on various video transmission scenarios. Trace-driven analysis over two representative tasks demonstrates that Zwei optimizes itself according to the assigned requirement faithfully, outperforming the state-of-the-art methods under all considered scenarios.","PeriodicalId":402166,"journal":{"name":"Proceedings of the 30th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3386290.3396930","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Video transmission services adopt adaptive algorithms to ensure users' demands. Existing techniques are often optimized and evaluated by a function that linearly combines several weighted metrics. Nevertheless, we observe that the given function fails to describe the requirement accurately. Thus, such proposed methods might eventually violate the original needs. To eliminate this concern, we propose Zwei, a self-play reinforcement learning algorithm for video transmission tasks. Zwei aims to update the policy by straightforwardly utilizing the actual requirement. Technically, Zwei samples a number of trajectories from the same starting point, and instantly estimates the win rate w.r.t the competition outcome. Here the competition result represents which trajectory is closer to the assigned requirement. Subsequently, Zwei optimizes the strategy by maximizing the win rate. To build Zwei, we develop simulation environments, design adequate neural network models, and invent training methods for dealing with different requirements on various video transmission scenarios. Trace-driven analysis over two representative tasks demonstrates that Zwei optimizes itself according to the assigned requirement faithfully, outperforming the state-of-the-art methods under all considered scenarios.

查看原文本刊更多论文

视频传输的自播放强化学习

视频传输业务采用自适应算法，保证用户需求。现有的技术通常是通过线性组合几个加权指标的函数来优化和评估的。然而，我们注意到给定的函数不能准确地描述需求。因此，这些建议的方法最终可能会违反原始需求。为了消除这种担忧，我们提出了Zwei，一种用于视频传输任务的自播放强化学习算法。Zwei的目的是直接利用实际需求来更新政策。从技术上讲，Zwei从相同的起点采样许多轨迹，并立即估计胜率与比赛结果。在这里，竞争结果表示哪个轨迹更接近分配的需求。随后，Zwei通过最大化胜率来优化策略。为了构建Zwei，我们开发了仿真环境，设计了足够的神经网络模型，并发明了训练方法来处理各种视频传输场景的不同需求。对两个代表性任务的跟踪驱动分析表明，Zwei根据分配的需求忠实地优化自己，在所有考虑的场景下都优于最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 30th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video

自引率

0.00%

发文量