{"title":"Viewing the 360° Future: Trade-Off Between User Field-of-View Prediction, Network Bandwidth, and Delay","authors":"Shahryar Afzal, Jiasi Chen, K. Ramakrishnan","doi":"10.1109/ICCCN49398.2020.9209659","DOIUrl":null,"url":null,"abstract":"Predicting a user’s field-of-view (FoV) accurately can help to significantly reduce the high bandwidth requirements for 360° video streaming, as it enables sending only the tiles corresponding to the predicted FoV. Since many approaches for user head-orientation (i.e., FoV) prediction have been proposed in the literature, ranging from simple linear regression to more complex neural networks, it is difficult to comprehensively decide which method to use. Towards resolving this gap in knowledge, in this work we benchmark user prediction algorithms over an aggregation of multiple datasets and study the implications of this analysis. Our results demonstrate that it is indeed difficult for any prediction algorithm to accurately predict a user’s FoV beyond a very short future time window of approximately 300 ms. We also observe that users’ viewing behavior is dominated by sideways head movement, rather than up-and-down. These findings have implications on network bandwidth, latency, and playback buffering at the client: (1) Extra \"padding\" tiles are needed around the user’s FoV in order to correct for prediction errors; in particular, a rectangular padding achieves lower stall rate than square padding, for the same bandwidth usage; (2) Video playout buffers, network delay, and jitter need to be small in order to avoid stale predictions of the user’s field-of-view, which are only valid 300 ms into the future; (3) Per-video and per-user personalization of the padding can save bandwidth for slow-moving users or videos. We mathematically quantify these tradeoffs and present simulation results to demonstrate these findings and implications. Our results have implications for FoV prediction methods in future 360° streaming systems.","PeriodicalId":137835,"journal":{"name":"2020 29th International Conference on Computer Communications and Networks (ICCCN)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 29th International Conference on Computer Communications and Networks (ICCCN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCN49398.2020.9209659","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Predicting a user’s field-of-view (FoV) accurately can help to significantly reduce the high bandwidth requirements for 360° video streaming, as it enables sending only the tiles corresponding to the predicted FoV. Since many approaches for user head-orientation (i.e., FoV) prediction have been proposed in the literature, ranging from simple linear regression to more complex neural networks, it is difficult to comprehensively decide which method to use. Towards resolving this gap in knowledge, in this work we benchmark user prediction algorithms over an aggregation of multiple datasets and study the implications of this analysis. Our results demonstrate that it is indeed difficult for any prediction algorithm to accurately predict a user’s FoV beyond a very short future time window of approximately 300 ms. We also observe that users’ viewing behavior is dominated by sideways head movement, rather than up-and-down. These findings have implications on network bandwidth, latency, and playback buffering at the client: (1) Extra "padding" tiles are needed around the user’s FoV in order to correct for prediction errors; in particular, a rectangular padding achieves lower stall rate than square padding, for the same bandwidth usage; (2) Video playout buffers, network delay, and jitter need to be small in order to avoid stale predictions of the user’s field-of-view, which are only valid 300 ms into the future; (3) Per-video and per-user personalization of the padding can save bandwidth for slow-moving users or videos. We mathematically quantify these tradeoffs and present simulation results to demonstrate these findings and implications. Our results have implications for FoV prediction methods in future 360° streaming systems.