Adaptive Video Streaming via Deep Reinforcement Learning from User Trajectory Preferences

Qingyu Xiao, Jin Ye, Chengjie Pang, Liangdi Ma, Wenchao Jiang
{"title":"Adaptive Video Streaming via Deep Reinforcement Learning from User Trajectory Preferences","authors":"Qingyu Xiao, Jin Ye, Chengjie Pang, Liangdi Ma, Wenchao Jiang","doi":"10.1109/IPCCC50635.2020.9391533","DOIUrl":null,"url":null,"abstract":"Client-side adaptive bitrate (ABR) algorithms based on deep reinforcement learning (RL) can continuously improve its adaptability to network conditions. However, most existing methods adopt fixed reward functions to train the ABR policy, which leads the results being not consistent with user-perceived quality of experience (QoE) in a long duration under various network conditions. In order to optimize the QoE, this paper proposes a novel ABR algorithm considering user preference based on short trajectory segments. The user-specific preference feedback, which is selected by the user from a pair of short track segments in advance, is collected and applied to define the training goal of RL. Specifically, we train a deep neural network to define the RL reward and integrate it with A3C-based ABR algorithm. The experiment results show that the accuracy of the proposed reward model outperforms most existing fixed reward functions by 13.6% in user preference prediction, and the optimized ABR algorithm improves QoE by 16.4% on average.","PeriodicalId":226034,"journal":{"name":"2020 IEEE 39th International Performance Computing and Communications Conference (IPCCC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 39th International Performance Computing and Communications Conference (IPCCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPCCC50635.2020.9391533","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Client-side adaptive bitrate (ABR) algorithms based on deep reinforcement learning (RL) can continuously improve its adaptability to network conditions. However, most existing methods adopt fixed reward functions to train the ABR policy, which leads the results being not consistent with user-perceived quality of experience (QoE) in a long duration under various network conditions. In order to optimize the QoE, this paper proposes a novel ABR algorithm considering user preference based on short trajectory segments. The user-specific preference feedback, which is selected by the user from a pair of short track segments in advance, is collected and applied to define the training goal of RL. Specifically, we train a deep neural network to define the RL reward and integrate it with A3C-based ABR algorithm. The experiment results show that the accuracy of the proposed reward model outperforms most existing fixed reward functions by 13.6% in user preference prediction, and the optimized ABR algorithm improves QoE by 16.4% on average.
基于用户轨迹偏好的深度强化学习自适应视频流
基于深度强化学习(RL)的客户端自适应比特率(ABR)算法可以不断提高其对网络条件的适应性。然而,现有的方法大多采用固定的奖励函数来训练ABR策略,导致在各种网络条件下,结果在长时间内与用户感知体验质量(QoE)不一致。为了优化QoE,提出了一种考虑用户偏好的基于短轨迹段的ABR算法。收集用户事先从一对短道段中选择的用户特定偏好反馈,并将其用于定义强化学习的训练目标。具体来说,我们训练了一个深度神经网络来定义RL奖励,并将其与基于a3c的ABR算法相结合。实验结果表明,所提奖励模型在预测用户偏好方面的准确率比大多数现有固定奖励函数高出13.6%,优化后的ABR算法QoE平均提高16.4%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信