Andrew Zhang, XiaoMing Chen, Ying Luo, Anna Qingfeng Li, William Cheung
{"title":"使用CMAF提供具有超低端到端延迟的高分辨率沉浸式视频直播","authors":"Andrew Zhang, XiaoMing Chen, Ying Luo, Anna Qingfeng Li, William Cheung","doi":"10.1145/3510450.3517292","DOIUrl":null,"url":null,"abstract":"Immersive video with 8K or higher resolution utilizes viewport-dependent tile-based video with multi-resolutions (i.e. low-resolution background video with high-resolution video). OMAF defines how to deliver tiled immersive video through MPEG DASH. But End-to-End latency is a consistent problem for the MPEG DASH solution. Solutions using short segment with 1 sec duration will reduce latency, but even in those cases, without CDNs, the end-to-end latency is still 5 secs or more. And in most cases, massive segment files generated every second harden CDN, leading to much longer latencies, such as 20 secs or more. In this paper, we introduce a solution using Common Media Application Format (CMAF) to deliver tile-based immersive video to reduce the end-to-end latency to sub-3 secs. Based on CMAF: We enabled long duration CMAF segment with shorter End-to-End Latency by using long duration CMAF segmentation reduce CDN pressure since it reduces the amount segment files generated. In addition, we re-fetch relative CMAF chunks of high-resolution segments via our own adaptive viewport prediction algorithm. We use a decoder catching-up mechanism for prediction-missed tiles to reduce the M2HQ (Motion-To-High-Quality) latency while viewport changed within chunks. As we will show, this leads to an overall sub-3 seconds End-to-End latency with ~1 second Packager-Display Latency and average 300ms M2HQ latency can be reached with 5 seconds segmentation in non-CDN environment.","PeriodicalId":122386,"journal":{"name":"Proceedings of the 1st Mile-High Video Conference","volume":"355 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using CMAF to deliver high resolution immersive video with ultra-low end to end latency for live streaming\",\"authors\":\"Andrew Zhang, XiaoMing Chen, Ying Luo, Anna Qingfeng Li, William Cheung\",\"doi\":\"10.1145/3510450.3517292\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Immersive video with 8K or higher resolution utilizes viewport-dependent tile-based video with multi-resolutions (i.e. low-resolution background video with high-resolution video). OMAF defines how to deliver tiled immersive video through MPEG DASH. But End-to-End latency is a consistent problem for the MPEG DASH solution. Solutions using short segment with 1 sec duration will reduce latency, but even in those cases, without CDNs, the end-to-end latency is still 5 secs or more. And in most cases, massive segment files generated every second harden CDN, leading to much longer latencies, such as 20 secs or more. In this paper, we introduce a solution using Common Media Application Format (CMAF) to deliver tile-based immersive video to reduce the end-to-end latency to sub-3 secs. Based on CMAF: We enabled long duration CMAF segment with shorter End-to-End Latency by using long duration CMAF segmentation reduce CDN pressure since it reduces the amount segment files generated. In addition, we re-fetch relative CMAF chunks of high-resolution segments via our own adaptive viewport prediction algorithm. We use a decoder catching-up mechanism for prediction-missed tiles to reduce the M2HQ (Motion-To-High-Quality) latency while viewport changed within chunks. As we will show, this leads to an overall sub-3 seconds End-to-End latency with ~1 second Packager-Display Latency and average 300ms M2HQ latency can be reached with 5 seconds segmentation in non-CDN environment.\",\"PeriodicalId\":122386,\"journal\":{\"name\":\"Proceedings of the 1st Mile-High Video Conference\",\"volume\":\"355 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 1st Mile-High Video Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3510450.3517292\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st Mile-High Video Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3510450.3517292","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using CMAF to deliver high resolution immersive video with ultra-low end to end latency for live streaming
Immersive video with 8K or higher resolution utilizes viewport-dependent tile-based video with multi-resolutions (i.e. low-resolution background video with high-resolution video). OMAF defines how to deliver tiled immersive video through MPEG DASH. But End-to-End latency is a consistent problem for the MPEG DASH solution. Solutions using short segment with 1 sec duration will reduce latency, but even in those cases, without CDNs, the end-to-end latency is still 5 secs or more. And in most cases, massive segment files generated every second harden CDN, leading to much longer latencies, such as 20 secs or more. In this paper, we introduce a solution using Common Media Application Format (CMAF) to deliver tile-based immersive video to reduce the end-to-end latency to sub-3 secs. Based on CMAF: We enabled long duration CMAF segment with shorter End-to-End Latency by using long duration CMAF segmentation reduce CDN pressure since it reduces the amount segment files generated. In addition, we re-fetch relative CMAF chunks of high-resolution segments via our own adaptive viewport prediction algorithm. We use a decoder catching-up mechanism for prediction-missed tiles to reduce the M2HQ (Motion-To-High-Quality) latency while viewport changed within chunks. As we will show, this leads to an overall sub-3 seconds End-to-End latency with ~1 second Packager-Display Latency and average 300ms M2HQ latency can be reached with 5 seconds segmentation in non-CDN environment.