基于用户轨迹偏好的深度强化学习自适应视频流

2020 IEEE 39th International Performance Computing and Communications Conference (IPCCC) Pub Date : 2020-11-06 DOI:10.1109/IPCCC50635.2020.9391533

Qingyu Xiao, Jin Ye, Chengjie Pang, Liangdi Ma, Wenchao Jiang

{"title":"基于用户轨迹偏好的深度强化学习自适应视频流","authors":"Qingyu Xiao, Jin Ye, Chengjie Pang, Liangdi Ma, Wenchao Jiang","doi":"10.1109/IPCCC50635.2020.9391533","DOIUrl":null,"url":null,"abstract":"Client-side adaptive bitrate (ABR) algorithms based on deep reinforcement learning (RL) can continuously improve its adaptability to network conditions. However, most existing methods adopt fixed reward functions to train the ABR policy, which leads the results being not consistent with user-perceived quality of experience (QoE) in a long duration under various network conditions. In order to optimize the QoE, this paper proposes a novel ABR algorithm considering user preference based on short trajectory segments. The user-specific preference feedback, which is selected by the user from a pair of short track segments in advance, is collected and applied to define the training goal of RL. Specifically, we train a deep neural network to define the RL reward and integrate it with A3C-based ABR algorithm. The experiment results show that the accuracy of the proposed reward model outperforms most existing fixed reward functions by 13.6% in user preference prediction, and the optimized ABR algorithm improves QoE by 16.4% on average.","PeriodicalId":226034,"journal":{"name":"2020 IEEE 39th International Performance Computing and Communications Conference (IPCCC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive Video Streaming via Deep Reinforcement Learning from User Trajectory Preferences\",\"authors\":\"Qingyu Xiao, Jin Ye, Chengjie Pang, Liangdi Ma, Wenchao Jiang\",\"doi\":\"10.1109/IPCCC50635.2020.9391533\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Client-side adaptive bitrate (ABR) algorithms based on deep reinforcement learning (RL) can continuously improve its adaptability to network conditions. However, most existing methods adopt fixed reward functions to train the ABR policy, which leads the results being not consistent with user-perceived quality of experience (QoE) in a long duration under various network conditions. In order to optimize the QoE, this paper proposes a novel ABR algorithm considering user preference based on short trajectory segments. The user-specific preference feedback, which is selected by the user from a pair of short track segments in advance, is collected and applied to define the training goal of RL. Specifically, we train a deep neural network to define the RL reward and integrate it with A3C-based ABR algorithm. The experiment results show that the accuracy of the proposed reward model outperforms most existing fixed reward functions by 13.6% in user preference prediction, and the optimized ABR algorithm improves QoE by 16.4% on average.\",\"PeriodicalId\":226034,\"journal\":{\"name\":\"2020 IEEE 39th International Performance Computing and Communications Conference (IPCCC)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 39th International Performance Computing and Communications Conference (IPCCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPCCC50635.2020.9391533\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 39th International Performance Computing and Communications Conference (IPCCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPCCC50635.2020.9391533","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

基于深度强化学习(RL)的客户端自适应比特率(ABR)算法可以不断提高其对网络条件的适应性。然而，现有的方法大多采用固定的奖励函数来训练ABR策略，导致在各种网络条件下，结果在长时间内与用户感知体验质量(QoE)不一致。为了优化QoE，提出了一种考虑用户偏好的基于短轨迹段的ABR算法。收集用户事先从一对短道段中选择的用户特定偏好反馈，并将其用于定义强化学习的训练目标。具体来说，我们训练了一个深度神经网络来定义RL奖励，并将其与基于a3c的ABR算法相结合。实验结果表明，所提奖励模型在预测用户偏好方面的准确率比大多数现有固定奖励函数高出13.6%，优化后的ABR算法QoE平均提高16.4%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Adaptive Video Streaming via Deep Reinforcement Learning from User Trajectory Preferences

Client-side adaptive bitrate (ABR) algorithms based on deep reinforcement learning (RL) can continuously improve its adaptability to network conditions. However, most existing methods adopt fixed reward functions to train the ABR policy, which leads the results being not consistent with user-perceived quality of experience (QoE) in a long duration under various network conditions. In order to optimize the QoE, this paper proposes a novel ABR algorithm considering user preference based on short trajectory segments. The user-specific preference feedback, which is selected by the user from a pair of short track segments in advance, is collected and applied to define the training goal of RL. Specifically, we train a deep neural network to define the RL reward and integrate it with A3C-based ABR algorithm. The experiment results show that the accuracy of the proposed reward model outperforms most existing fixed reward functions by 13.6% in user preference prediction, and the optimized ABR algorithm improves QoE by 16.4% on average.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE 39th International Performance Computing and Communications Conference (IPCCC)

自引率

0.00%

发文量