使用CMAF提供具有超低端到端延迟的高分辨率沉浸式视频直播

Proceedings of the 1st Mile-High Video Conference Pub Date : 2022-03-01 DOI:10.1145/3510450.3517292

Andrew Zhang, XiaoMing Chen, Ying Luo, Anna Qingfeng Li, William Cheung

{"title":"使用CMAF提供具有超低端到端延迟的高分辨率沉浸式视频直播","authors":"Andrew Zhang, XiaoMing Chen, Ying Luo, Anna Qingfeng Li, William Cheung","doi":"10.1145/3510450.3517292","DOIUrl":null,"url":null,"abstract":"Immersive video with 8K or higher resolution utilizes viewport-dependent tile-based video with multi-resolutions (i.e. low-resolution background video with high-resolution video). OMAF defines how to deliver tiled immersive video through MPEG DASH. But End-to-End latency is a consistent problem for the MPEG DASH solution. Solutions using short segment with 1 sec duration will reduce latency, but even in those cases, without CDNs, the end-to-end latency is still 5 secs or more. And in most cases, massive segment files generated every second harden CDN, leading to much longer latencies, such as 20 secs or more. In this paper, we introduce a solution using Common Media Application Format (CMAF) to deliver tile-based immersive video to reduce the end-to-end latency to sub-3 secs. Based on CMAF: We enabled long duration CMAF segment with shorter End-to-End Latency by using long duration CMAF segmentation reduce CDN pressure since it reduces the amount segment files generated. In addition, we re-fetch relative CMAF chunks of high-resolution segments via our own adaptive viewport prediction algorithm. We use a decoder catching-up mechanism for prediction-missed tiles to reduce the M2HQ (Motion-To-High-Quality) latency while viewport changed within chunks. As we will show, this leads to an overall sub-3 seconds End-to-End latency with ~1 second Packager-Display Latency and average 300ms M2HQ latency can be reached with 5 seconds segmentation in non-CDN environment.","PeriodicalId":122386,"journal":{"name":"Proceedings of the 1st Mile-High Video Conference","volume":"355 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using CMAF to deliver high resolution immersive video with ultra-low end to end latency for live streaming\",\"authors\":\"Andrew Zhang, XiaoMing Chen, Ying Luo, Anna Qingfeng Li, William Cheung\",\"doi\":\"10.1145/3510450.3517292\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Immersive video with 8K or higher resolution utilizes viewport-dependent tile-based video with multi-resolutions (i.e. low-resolution background video with high-resolution video). OMAF defines how to deliver tiled immersive video through MPEG DASH. But End-to-End latency is a consistent problem for the MPEG DASH solution. Solutions using short segment with 1 sec duration will reduce latency, but even in those cases, without CDNs, the end-to-end latency is still 5 secs or more. And in most cases, massive segment files generated every second harden CDN, leading to much longer latencies, such as 20 secs or more. In this paper, we introduce a solution using Common Media Application Format (CMAF) to deliver tile-based immersive video to reduce the end-to-end latency to sub-3 secs. Based on CMAF: We enabled long duration CMAF segment with shorter End-to-End Latency by using long duration CMAF segmentation reduce CDN pressure since it reduces the amount segment files generated. In addition, we re-fetch relative CMAF chunks of high-resolution segments via our own adaptive viewport prediction algorithm. We use a decoder catching-up mechanism for prediction-missed tiles to reduce the M2HQ (Motion-To-High-Quality) latency while viewport changed within chunks. As we will show, this leads to an overall sub-3 seconds End-to-End latency with ~1 second Packager-Display Latency and average 300ms M2HQ latency can be reached with 5 seconds segmentation in non-CDN environment.\",\"PeriodicalId\":122386,\"journal\":{\"name\":\"Proceedings of the 1st Mile-High Video Conference\",\"volume\":\"355 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 1st Mile-High Video Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3510450.3517292\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st Mile-High Video Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3510450.3517292","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

8K或更高分辨率的沉浸式视频利用视口相关的多分辨率平铺式视频(即低分辨率背景视频与高分辨率视频)。OMAF定义了如何通过MPEG DASH提供平铺式沉浸式视频。但是端到端延迟是MPEG DASH解决方案的一贯问题。使用持续时间为1秒的短段的解决方案将减少延迟，但即使在没有cdn的情况下，端到端延迟仍然是5秒或更长。在大多数情况下，每秒生成的大量段文件会加强CDN，导致更长的延迟，例如20秒或更长时间。在本文中，我们介绍了一种使用通用媒体应用格式(CMAF)的解决方案，以提供基于贴片的沉浸式视频，将端到端延迟减少到3秒以下。基于CMAF:我们启用了长时间CMAF段，通过使用长时间CMAF分段减少CDN压力，因为它减少了生成的段文件数量，从而缩短了端到端延迟。此外，我们通过我们自己的自适应视口预测算法重新获取高分辨率片段的相对CMAF块。当视口在块内发生变化时，我们使用解码器追赶机制来减少M2HQ(运动到高质量)延迟。正如我们将展示的那样，这导致在非cdn环境中，端到端总延迟低于3秒，封装显示延迟约为1秒，而在5秒的分段中平均可以达到300ms的M2HQ延迟。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Using CMAF to deliver high resolution immersive video with ultra-low end to end latency for live streaming

Immersive video with 8K or higher resolution utilizes viewport-dependent tile-based video with multi-resolutions (i.e. low-resolution background video with high-resolution video). OMAF defines how to deliver tiled immersive video through MPEG DASH. But End-to-End latency is a consistent problem for the MPEG DASH solution. Solutions using short segment with 1 sec duration will reduce latency, but even in those cases, without CDNs, the end-to-end latency is still 5 secs or more. And in most cases, massive segment files generated every second harden CDN, leading to much longer latencies, such as 20 secs or more. In this paper, we introduce a solution using Common Media Application Format (CMAF) to deliver tile-based immersive video to reduce the end-to-end latency to sub-3 secs. Based on CMAF: We enabled long duration CMAF segment with shorter End-to-End Latency by using long duration CMAF segmentation reduce CDN pressure since it reduces the amount segment files generated. In addition, we re-fetch relative CMAF chunks of high-resolution segments via our own adaptive viewport prediction algorithm. We use a decoder catching-up mechanism for prediction-missed tiles to reduce the M2HQ (Motion-To-High-Quality) latency while viewport changed within chunks. As we will show, this leads to an overall sub-3 seconds End-to-End latency with ~1 second Packager-Display Latency and average 300ms M2HQ latency can be reached with 5 seconds segmentation in non-CDN environment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 1st Mile-High Video Conference

自引率

0.00%

发文量