{"title":"Improving streaming quality and bitrate efficiency with dynamic resolution selection","authors":"X. Ducloux, Patrick Gendron, T. Fautier","doi":"10.1145/3510450.3517304","DOIUrl":"https://doi.org/10.1145/3510450.3517304","url":null,"abstract":"Dynamic Resolution Selection is a technology that has been deployed by Netflix with its per-scene encoding mechanism applied to VOD assets. The technology is based on a posteriori analysis of all the encoded resolutions to determine the best resolution for a given scene, in terms of quality and bandwidth used, based on VMAF analysis. It cannot be applied to live content, as it would require too much processing power and can't be used in real time. The method proposed in this paper is based on a machine learning (ML) mechanism that learns how to pick the best resolution to be encoded in a supervised learning environment. At run time, using the already existing pre-processing stage, the live encoder can decide on the best resolution to encode, without adding any processing complexity or delay. This results in higher quality of experience (QoE) or lower bitrate, as well as lower CPU footprint vs. a classical fixed ladder approach. This paper will present the results obtained for live HD or 4K content delivery across different networks, including classical TS (DVB), native IP (ATSC 3.0) and ABR (DASH/HLS). In addition, the paper will report on the interoperability results of tested devices.","PeriodicalId":122386,"journal":{"name":"Proceedings of the 1st Mile-High Video Conference","volume":"215 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121112820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On multiple media representations and CDN performance","authors":"Y. Reznik, Thiago Teixeira, Robert Peck","doi":"10.1145/3510450.3517320","DOIUrl":"https://doi.org/10.1145/3510450.3517320","url":null,"abstract":"This paper proposes a mathematical model describing the effects of using multiple media representations on CDN performance in HTTP-based streaming systems. Specifically, we look at cases of using multiple versions of the same content packaged differently and derive an asymptotic formula for CDN cache-miss probability considering parameters of the content's distribution and the distribution of formats used for packaging and delivery. We then study the validity of this proposed formula by considering statistics collected for several streaming deployments using mixed HLS and DASH packaging and show that it predicts the experimentally observed data reasonably well. We further discuss several possible extensions and applications of this proposed model.","PeriodicalId":122386,"journal":{"name":"Proceedings of the 1st Mile-High Video Conference","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121358162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nabajeet Barman, Steven Schmidt, Saman Zadtootaghaj, M. Martini
{"title":"Evaluation of MPEG-5 part 2 (LCEVC) for live gaming video streaming applications","authors":"Nabajeet Barman, Steven Schmidt, Saman Zadtootaghaj, M. Martini","doi":"10.1145/3510450.3517279","DOIUrl":"https://doi.org/10.1145/3510450.3517279","url":null,"abstract":"This paper presents an evaluation of the latest MPEG-5 Part 2 Low Complexity Enhancement Video Coding (LCEVC) for live gaming video streaming applications. The results are presented in terms of both objective and subjective quality measures. Our results indicate that LCEVC outperforms both x264 and x265 codecs in terms of bitrate savings using VMAF. Using subjective results, it is found that LCEVC outperforms the respective base codecs, especially for low bitrates. This effect is much more dominant for x264 as compared to x265, with marginal absolute improvement of quality scores for x265.","PeriodicalId":122386,"journal":{"name":"Proceedings of the 1st Mile-High Video Conference","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121695411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. Robitza, Rakesh Rao Ramachandra Rao, Steve Göring, A. Dethof, A. Raake
{"title":"Deploying the ITU-T P.1203 QoE model in the wild and retraining for new codecs","authors":"W. Robitza, Rakesh Rao Ramachandra Rao, Steve Göring, A. Dethof, A. Raake","doi":"10.1145/3510450.3517310","DOIUrl":"https://doi.org/10.1145/3510450.3517310","url":null,"abstract":"This paper presents two challenges associated with using the ITU-T P.1203 standard for video quality monitoring in practice. We discuss the issue of unavailable data on certain browsers/platforms and the lack of information within newly developed data formats like Common Media Client Data. We also re-trained the coefficients of the P.1203.1 video model for newer codecs, and published a completely new model derived from the P.1204.3 bitstream model.","PeriodicalId":122386,"journal":{"name":"Proceedings of the 1st Mile-High Video Conference","volume":"364 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116365591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrew Zhang, XiaoMing Chen, Ying Luo, Anna Qingfeng Li, William Cheung
{"title":"Using CMAF to deliver high resolution immersive video with ultra-low end to end latency for live streaming","authors":"Andrew Zhang, XiaoMing Chen, Ying Luo, Anna Qingfeng Li, William Cheung","doi":"10.1145/3510450.3517292","DOIUrl":"https://doi.org/10.1145/3510450.3517292","url":null,"abstract":"Immersive video with 8K or higher resolution utilizes viewport-dependent tile-based video with multi-resolutions (i.e. low-resolution background video with high-resolution video). OMAF defines how to deliver tiled immersive video through MPEG DASH. But End-to-End latency is a consistent problem for the MPEG DASH solution. Solutions using short segment with 1 sec duration will reduce latency, but even in those cases, without CDNs, the end-to-end latency is still 5 secs or more. And in most cases, massive segment files generated every second harden CDN, leading to much longer latencies, such as 20 secs or more. In this paper, we introduce a solution using Common Media Application Format (CMAF) to deliver tile-based immersive video to reduce the end-to-end latency to sub-3 secs. Based on CMAF: We enabled long duration CMAF segment with shorter End-to-End Latency by using long duration CMAF segmentation reduce CDN pressure since it reduces the amount segment files generated. In addition, we re-fetch relative CMAF chunks of high-resolution segments via our own adaptive viewport prediction algorithm. We use a decoder catching-up mechanism for prediction-missed tiles to reduce the M2HQ (Motion-To-High-Quality) latency while viewport changed within chunks. As we will show, this leads to an overall sub-3 seconds End-to-End latency with ~1 second Packager-Display Latency and average 300ms M2HQ latency can be reached with 5 seconds segmentation in non-CDN environment.","PeriodicalId":122386,"journal":{"name":"Proceedings of the 1st Mile-High Video Conference","volume":"355 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123548247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Overview of the DASH-HLS interoperability specification: 2021 edition","authors":"Zachary A. Cava","doi":"10.1145/3510450.3517281","DOIUrl":"https://doi.org/10.1145/3510450.3517281","url":null,"abstract":"While CMAF has provided the foundation for the interoperable packaging of streaming media, today it is still common practice to produce media specific to the delivery formats utilized by a service provider. As DASH and HLS are the delivery formats the industry has converged towards, a survey of deployments for DASH and HLS revealed two leading reasons for divergent packaging: media packaging requirements that were misaligned across formats and a non-trivial amount of tribal knowledge required to address media for common deployment use-cases in each format. To address the divergence of CMAF packaged media in DASH and HLS, the CTA WAVE project created a working group, the DASH-HLS Interoperability group, responsible for researching and transcribing the additional packaging and delivery format requirements necessary to achieve interoperability. Using industry guidance, the group defined a set of common streaming use-cases and published the interoperability details for the first four usecases in the 2021 Edition of the DASH-HLS Interoperability Specification (CTA-5005) [1]. The use-cases in this edition are: Basic On-Demand and Live Streaming, Low Latency Live Streaming, Encrypted Media Presentations, and Presentation Splicing. This talk will provide an overview of the specification outputs for these initial use-cases including the defined packaging and addressing requirements and any identified missing interoperability points that represent opportunities for further research. Beyond the current specification, this talk will highlight the new use-cases and work currently being prioritized for the next edition and how interested entities can get involved with the development.","PeriodicalId":122386,"journal":{"name":"Proceedings of the 1st Mile-High Video Conference","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128845515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Erfanian, Hadi Amirpour, F. Tashtarian, C. Timmerer, H. Hellwagner
{"title":"Video streaming using light-weight transcoding and in-network intelligence","authors":"A. Erfanian, Hadi Amirpour, F. Tashtarian, C. Timmerer, H. Hellwagner","doi":"10.1145/3510450.3517284","DOIUrl":"https://doi.org/10.1145/3510450.3517284","url":null,"abstract":"In this paper, we introduce a novel approach, LwTE, which reduces streaming costs in HTTP Adaptive Streaming (HAS) by enabling light-weight transcoding at the edge. In LwTE, during encoding of a video segment in the origin server, a metadata is generated which stores the optimal encoding decisions. LwTE enables us to store only the highest bitrate plus corresponding metadata (of very small size) for unpopular video segments/bitrates. Since metadata is of very small size, replacing unpopular video segments/bitrates with their metadata results in considerable saving in the storage costs. The metadata is reused at the edge servers to reduce the required time and computational resources for on-the-fly transcoding.","PeriodicalId":122386,"journal":{"name":"Proceedings of the 1st Mile-High Video Conference","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123791043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low encoding overhead ultra-low latency streaming via HESP through sparse initialization streams","authors":"Pieter-Jan Speelmans","doi":"10.1145/3510450.3517294","DOIUrl":"https://doi.org/10.1145/3510450.3517294","url":null,"abstract":"HESP, the High Efficiency Streaming Protocol [4], realizes ultra-low latencies and ultra-short start-up times by combining two feeds, the keyframe-only Initialization Stream and the ultra-low latency CMAF-CTE Continuation Stream. HESP uses a keyframe from the Initialization Stream to start playback (via keyframe injection) of the Continuation Stream extremely close to the live edge. In previous research [5], the impact of the HESP keyframe injection on the video quality has been proven to be very low or even negligible. In contrast to the trivial double encoding for each quality in the bitrate ladder, in this paper we show that the overhead of the generation of the keyframe-only Initialization Streams can be reduced. We designed an approach in which the frequency of keyframes in the Initialization Streams is defined by a trade-off between the encoding overhead and two metrics in the viewing QoE: start-up time and time that it takes to switch to the highest feasible video quality of the ABR ladder. More specifically, for each quality Qi, fi is defined such that (i) switching to Qi, either for start-up or for switching to Qi as a higher quality, takes [EQUATION] additional delay, and (ii) there always is a Qi, lower than Qcurrent (unless Qcurrent is the lowest quality) to which the player can switch down instantly, which is needed in case of network problems. The resulting impact on the viewer QoE is characterizedby occasional (whenever an ABR switch to a higher quality is needed) short intervals [EQUATION] during which playback potentially is done at a lower than feasible video quality. Based on measurements, the proposed approach results in an overhead when encoding Initialization Streams of only 15 to 20%. Compared to \"standard\" HESP, the viewer QoE reduction is hardly noticeable.","PeriodicalId":122386,"journal":{"name":"Proceedings of the 1st Mile-High Video Conference","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133116819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Babak Taraghi, A. Bentaleb, C. Timmerer, Roger Zimmermann, H. Hellwagner
{"title":"CAdViSE or how to find the sweet spots of ABR systems","authors":"Babak Taraghi, A. Bentaleb, C. Timmerer, Roger Zimmermann, H. Hellwagner","doi":"10.1145/3510450.3517274","DOIUrl":"https://doi.org/10.1145/3510450.3517274","url":null,"abstract":"With the recent surge in Internet multimedia traffic, the enhancement and improvement of media players, specifically Dynamic Adaptive Streaming over HTTP (DASH) media players happened at an incredible rate. DASH Media players take advantage of adapting a media stream to the network fluctuations by continuously monitoring the network and making decisions in near real-time. The performance of algorithms that are in charge of making such decisions was often difficult to be evaluated and objectively assessed from an End-to-end or holistic perspective [1]. CAdViSE provides a Cloud-based Adaptive Video Streaming Evaluation framework for the automated testing of adaptive media players [4]. We will introduce the CAdViSE framework, its application, and propose the benefits and advantages that it can bring to every web-based media player development pipeline. To demonstrate the power of CAdViSE in evaluating Adaptive Bitrate (ABR) algorithms we will exhibit its capabilities when combined with objective Quality of Experience (QoE) models. Our team at Bitmovin Inc. and ATHENA laboratory has selected the ITU-T P.1203 (mode 1) quality evaluation model in order to assess the experiments and calculate the Mean Opinion Score (MOS), and better understand the behavior of a set of well-known ABR algorithms in a real-life setting [2]. We will display how we tested and deployed our framework using a modular architecture into a cloud infrastructure. This method yields a massive growth to the number of concurrent experiments and the number of media players that can be evaluated and compared at the same time, thus enabling maximum potential scalability. In our team's most recent experiments, we used Amazon Web Services (AWS) for demonstration purposes. Another awesome feature of CAdViSE that will be discussed here is the ability to shape the test network with endless network profiles. To do so, we used a fluctuation network profile and a real LTE network trace based on the recorded internet usage of a bicycle commuter in Belgium. CAdViSE produces comprehensive logs for each experimental session. These logs can then be applied against different goals, such as objective evaluation or to stitch back media segments and conduct subjective evaluations. In addition, startup delays, stall events, and other media streaming defects can be imitated exactly as they happened during the experimental streaming sessions [3].","PeriodicalId":122386,"journal":{"name":"Proceedings of the 1st Mile-High Video Conference","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131238162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Silhavy, S. Pham, S. Arbanowski, S. Steglich, Björn Harrer
{"title":"Latest advances in the development of the open-source player dash.js","authors":"D. Silhavy, S. Pham, S. Arbanowski, S. Steglich, Björn Harrer","doi":"10.1145/3510450.3517311","DOIUrl":"https://doi.org/10.1145/3510450.3517311","url":null,"abstract":"The trend to consume high-quality videos over the internet lead to a high demand for sophisticated and robust video player implementations. dash.js is a prominent option for implementing production grade DASH-based applications and products, and is also widely used for academic research purposes. In this paper, we introduce the latest additions and improvements to dash.js. We focus on various features and use cases such as player performance and robustness, low latency streaming, metric reporting and digital rights management. The features and improvements introduced in this paper provide great benefits not only for media streaming clients, but also for the server-side components involved in the media stream process.","PeriodicalId":122386,"journal":{"name":"Proceedings of the 1st Mile-High Video Conference","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132790514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}