Qingyu Xiao, Jin Ye, Chengjie Pang, Liangdi Ma, Wenchao Jiang
{"title":"基于用户轨迹偏好的深度强化学习自适应视频流","authors":"Qingyu Xiao, Jin Ye, Chengjie Pang, Liangdi Ma, Wenchao Jiang","doi":"10.1109/IPCCC50635.2020.9391533","DOIUrl":null,"url":null,"abstract":"Client-side adaptive bitrate (ABR) algorithms based on deep reinforcement learning (RL) can continuously improve its adaptability to network conditions. However, most existing methods adopt fixed reward functions to train the ABR policy, which leads the results being not consistent with user-perceived quality of experience (QoE) in a long duration under various network conditions. In order to optimize the QoE, this paper proposes a novel ABR algorithm considering user preference based on short trajectory segments. The user-specific preference feedback, which is selected by the user from a pair of short track segments in advance, is collected and applied to define the training goal of RL. Specifically, we train a deep neural network to define the RL reward and integrate it with A3C-based ABR algorithm. The experiment results show that the accuracy of the proposed reward model outperforms most existing fixed reward functions by 13.6% in user preference prediction, and the optimized ABR algorithm improves QoE by 16.4% on average.","PeriodicalId":226034,"journal":{"name":"2020 IEEE 39th International Performance Computing and Communications Conference (IPCCC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive Video Streaming via Deep Reinforcement Learning from User Trajectory Preferences\",\"authors\":\"Qingyu Xiao, Jin Ye, Chengjie Pang, Liangdi Ma, Wenchao Jiang\",\"doi\":\"10.1109/IPCCC50635.2020.9391533\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Client-side adaptive bitrate (ABR) algorithms based on deep reinforcement learning (RL) can continuously improve its adaptability to network conditions. However, most existing methods adopt fixed reward functions to train the ABR policy, which leads the results being not consistent with user-perceived quality of experience (QoE) in a long duration under various network conditions. In order to optimize the QoE, this paper proposes a novel ABR algorithm considering user preference based on short trajectory segments. The user-specific preference feedback, which is selected by the user from a pair of short track segments in advance, is collected and applied to define the training goal of RL. Specifically, we train a deep neural network to define the RL reward and integrate it with A3C-based ABR algorithm. The experiment results show that the accuracy of the proposed reward model outperforms most existing fixed reward functions by 13.6% in user preference prediction, and the optimized ABR algorithm improves QoE by 16.4% on average.\",\"PeriodicalId\":226034,\"journal\":{\"name\":\"2020 IEEE 39th International Performance Computing and Communications Conference (IPCCC)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 39th International Performance Computing and Communications Conference (IPCCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPCCC50635.2020.9391533\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 39th International Performance Computing and Communications Conference (IPCCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPCCC50635.2020.9391533","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Adaptive Video Streaming via Deep Reinforcement Learning from User Trajectory Preferences
Client-side adaptive bitrate (ABR) algorithms based on deep reinforcement learning (RL) can continuously improve its adaptability to network conditions. However, most existing methods adopt fixed reward functions to train the ABR policy, which leads the results being not consistent with user-perceived quality of experience (QoE) in a long duration under various network conditions. In order to optimize the QoE, this paper proposes a novel ABR algorithm considering user preference based on short trajectory segments. The user-specific preference feedback, which is selected by the user from a pair of short track segments in advance, is collected and applied to define the training goal of RL. Specifically, we train a deep neural network to define the RL reward and integrate it with A3C-based ABR algorithm. The experiment results show that the accuracy of the proposed reward model outperforms most existing fixed reward functions by 13.6% in user preference prediction, and the optimized ABR algorithm improves QoE by 16.4% on average.