Self-play reinforcement learning for video transmission

Tianchi Huang, Ruixiao Zhang, Lifeng Sun
{"title":"Self-play reinforcement learning for video transmission","authors":"Tianchi Huang, Ruixiao Zhang, Lifeng Sun","doi":"10.1145/3386290.3396930","DOIUrl":null,"url":null,"abstract":"Video transmission services adopt adaptive algorithms to ensure users' demands. Existing techniques are often optimized and evaluated by a function that linearly combines several weighted metrics. Nevertheless, we observe that the given function fails to describe the requirement accurately. Thus, such proposed methods might eventually violate the original needs. To eliminate this concern, we propose Zwei, a self-play reinforcement learning algorithm for video transmission tasks. Zwei aims to update the policy by straightforwardly utilizing the actual requirement. Technically, Zwei samples a number of trajectories from the same starting point, and instantly estimates the win rate w.r.t the competition outcome. Here the competition result represents which trajectory is closer to the assigned requirement. Subsequently, Zwei optimizes the strategy by maximizing the win rate. To build Zwei, we develop simulation environments, design adequate neural network models, and invent training methods for dealing with different requirements on various video transmission scenarios. Trace-driven analysis over two representative tasks demonstrates that Zwei optimizes itself according to the assigned requirement faithfully, outperforming the state-of-the-art methods under all considered scenarios.","PeriodicalId":402166,"journal":{"name":"Proceedings of the 30th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3386290.3396930","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Video transmission services adopt adaptive algorithms to ensure users' demands. Existing techniques are often optimized and evaluated by a function that linearly combines several weighted metrics. Nevertheless, we observe that the given function fails to describe the requirement accurately. Thus, such proposed methods might eventually violate the original needs. To eliminate this concern, we propose Zwei, a self-play reinforcement learning algorithm for video transmission tasks. Zwei aims to update the policy by straightforwardly utilizing the actual requirement. Technically, Zwei samples a number of trajectories from the same starting point, and instantly estimates the win rate w.r.t the competition outcome. Here the competition result represents which trajectory is closer to the assigned requirement. Subsequently, Zwei optimizes the strategy by maximizing the win rate. To build Zwei, we develop simulation environments, design adequate neural network models, and invent training methods for dealing with different requirements on various video transmission scenarios. Trace-driven analysis over two representative tasks demonstrates that Zwei optimizes itself according to the assigned requirement faithfully, outperforming the state-of-the-art methods under all considered scenarios.
视频传输的自播放强化学习
视频传输业务采用自适应算法,保证用户需求。现有的技术通常是通过线性组合几个加权指标的函数来优化和评估的。然而,我们注意到给定的函数不能准确地描述需求。因此,这些建议的方法最终可能会违反原始需求。为了消除这种担忧,我们提出了Zwei,一种用于视频传输任务的自播放强化学习算法。Zwei的目的是直接利用实际需求来更新政策。从技术上讲,Zwei从相同的起点采样许多轨迹,并立即估计胜率与比赛结果。在这里,竞争结果表示哪个轨迹更接近分配的需求。随后,Zwei通过最大化胜率来优化策略。为了构建Zwei,我们开发了仿真环境,设计了足够的神经网络模型,并发明了训练方法来处理各种视频传输场景的不同需求。对两个代表性任务的跟踪驱动分析表明,Zwei根据分配的需求忠实地优化自己,在所有考虑的场景下都优于最先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信