{"title":"A standards-based framework for real-time media in immersive scenes","authors":"Imed Bouazizi, T. Stockhammer","doi":"10.1145/3510450.3517288","DOIUrl":null,"url":null,"abstract":"Immersive media experiences are anticipated to become the norm in entertainment and communication in the near future, enabled by advances in computer graphics, capture and display systems, and networking technology. Immersive experiences are based on a rich 3D scene that enables immersion, fusion with the real world, and rich interactivity. However, 3D scenes are large, rich and complex - and hence stored and processed not only on devices, but on cloud systems. MPEG is currently working on specifying a set of functionalities that address different aspects of immersive media, including formats, access and delivery, and compression of these emerging media types. The scene description standard as defined in part 14 of the MPEG immersive standard family [1] provides the entry point and glue to such immersive experiences. The key design principle of the architecture behind it, was to separate media access from rendering. The scene description standard achieves this by defining a separate Media Access Function (MAF) and the API to access it. The MPEG-I scene description reference architecture is depicted in 1. The MAF receives instructions from the presentation engine on the media referenced in the scene. It uses this information to establish the proper media pipelines to fetch the media and pass it in the desired format to the presentation engine for rendering. The request for media also includes information about the current viewer's position as well as the scene camera position and intrinsic parameters. This enables the MAF to implement a wide range of optimization techniques, such as the adaptation of the retrieved media to the network conditions based on the viewer's position and orientation with regards to the object to be fetched. These adaptations may include partial retrieval, access at different levels of detail, and adjustment of quality. In this paper, we describe the architecture for immersive media and the functionality performed by the MAF to optimize the streaming of immersive media. We discuss the different adaptation options based on a selected set of MPEG formats for 3D content (i.e. video textures, dynamic meshes, and point clouds). We describe possible designs of such adaptation algorithms for real-time media delivery in the example of immersive conferencing.","PeriodicalId":122386,"journal":{"name":"Proceedings of the 1st Mile-High Video Conference","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st Mile-High Video Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3510450.3517288","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Immersive media experiences are anticipated to become the norm in entertainment and communication in the near future, enabled by advances in computer graphics, capture and display systems, and networking technology. Immersive experiences are based on a rich 3D scene that enables immersion, fusion with the real world, and rich interactivity. However, 3D scenes are large, rich and complex - and hence stored and processed not only on devices, but on cloud systems. MPEG is currently working on specifying a set of functionalities that address different aspects of immersive media, including formats, access and delivery, and compression of these emerging media types. The scene description standard as defined in part 14 of the MPEG immersive standard family [1] provides the entry point and glue to such immersive experiences. The key design principle of the architecture behind it, was to separate media access from rendering. The scene description standard achieves this by defining a separate Media Access Function (MAF) and the API to access it. The MPEG-I scene description reference architecture is depicted in 1. The MAF receives instructions from the presentation engine on the media referenced in the scene. It uses this information to establish the proper media pipelines to fetch the media and pass it in the desired format to the presentation engine for rendering. The request for media also includes information about the current viewer's position as well as the scene camera position and intrinsic parameters. This enables the MAF to implement a wide range of optimization techniques, such as the adaptation of the retrieved media to the network conditions based on the viewer's position and orientation with regards to the object to be fetched. These adaptations may include partial retrieval, access at different levels of detail, and adjustment of quality. In this paper, we describe the architecture for immersive media and the functionality performed by the MAF to optimize the streaming of immersive media. We discuss the different adaptation options based on a selected set of MPEG formats for 3D content (i.e. video textures, dynamic meshes, and point clouds). We describe possible designs of such adaptation algorithms for real-time media delivery in the example of immersive conferencing.