A standards-based framework for real-time media in immersive scenes

Proceedings of the 1st Mile-High Video Conference Pub Date : 2022-03-01 DOI:10.1145/3510450.3517288

Imed Bouazizi, T. Stockhammer

{"title":"A standards-based framework for real-time media in immersive scenes","authors":"Imed Bouazizi, T. Stockhammer","doi":"10.1145/3510450.3517288","DOIUrl":null,"url":null,"abstract":"Immersive media experiences are anticipated to become the norm in entertainment and communication in the near future, enabled by advances in computer graphics, capture and display systems, and networking technology. Immersive experiences are based on a rich 3D scene that enables immersion, fusion with the real world, and rich interactivity. However, 3D scenes are large, rich and complex - and hence stored and processed not only on devices, but on cloud systems. MPEG is currently working on specifying a set of functionalities that address different aspects of immersive media, including formats, access and delivery, and compression of these emerging media types. The scene description standard as defined in part 14 of the MPEG immersive standard family [1] provides the entry point and glue to such immersive experiences. The key design principle of the architecture behind it, was to separate media access from rendering. The scene description standard achieves this by defining a separate Media Access Function (MAF) and the API to access it. The MPEG-I scene description reference architecture is depicted in 1. The MAF receives instructions from the presentation engine on the media referenced in the scene. It uses this information to establish the proper media pipelines to fetch the media and pass it in the desired format to the presentation engine for rendering. The request for media also includes information about the current viewer's position as well as the scene camera position and intrinsic parameters. This enables the MAF to implement a wide range of optimization techniques, such as the adaptation of the retrieved media to the network conditions based on the viewer's position and orientation with regards to the object to be fetched. These adaptations may include partial retrieval, access at different levels of detail, and adjustment of quality. In this paper, we describe the architecture for immersive media and the functionality performed by the MAF to optimize the streaming of immersive media. We discuss the different adaptation options based on a selected set of MPEG formats for 3D content (i.e. video textures, dynamic meshes, and point clouds). We describe possible designs of such adaptation algorithms for real-time media delivery in the example of immersive conferencing.","PeriodicalId":122386,"journal":{"name":"Proceedings of the 1st Mile-High Video Conference","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st Mile-High Video Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3510450.3517288","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Immersive media experiences are anticipated to become the norm in entertainment and communication in the near future, enabled by advances in computer graphics, capture and display systems, and networking technology. Immersive experiences are based on a rich 3D scene that enables immersion, fusion with the real world, and rich interactivity. However, 3D scenes are large, rich and complex - and hence stored and processed not only on devices, but on cloud systems. MPEG is currently working on specifying a set of functionalities that address different aspects of immersive media, including formats, access and delivery, and compression of these emerging media types. The scene description standard as defined in part 14 of the MPEG immersive standard family [1] provides the entry point and glue to such immersive experiences. The key design principle of the architecture behind it, was to separate media access from rendering. The scene description standard achieves this by defining a separate Media Access Function (MAF) and the API to access it. The MPEG-I scene description reference architecture is depicted in 1. The MAF receives instructions from the presentation engine on the media referenced in the scene. It uses this information to establish the proper media pipelines to fetch the media and pass it in the desired format to the presentation engine for rendering. The request for media also includes information about the current viewer's position as well as the scene camera position and intrinsic parameters. This enables the MAF to implement a wide range of optimization techniques, such as the adaptation of the retrieved media to the network conditions based on the viewer's position and orientation with regards to the object to be fetched. These adaptations may include partial retrieval, access at different levels of detail, and adjustment of quality. In this paper, we describe the architecture for immersive media and the functionality performed by the MAF to optimize the streaming of immersive media. We discuss the different adaptation options based on a selected set of MPEG formats for 3D content (i.e. video textures, dynamic meshes, and point clouds). We describe possible designs of such adaptation algorithms for real-time media delivery in the example of immersive conferencing.

查看原文本刊更多论文

一个基于标准的框架，用于沉浸式场景中的实时媒体

随着计算机图形、捕捉和显示系统以及网络技术的进步，沉浸式媒体体验有望在不久的将来成为娱乐和通信的标准。沉浸式体验基于丰富的3D场景，可实现沉浸感，与现实世界融合，并具有丰富的交互性。然而，3D场景庞大、丰富而复杂，因此不仅要在设备上存储和处理，还要在云系统上存储和处理。MPEG目前正致力于指定一组功能，以解决沉浸式媒体的不同方面，包括格式、访问和交付，以及这些新兴媒体类型的压缩。MPEG沉浸式标准家族[1]的第14部分中定义的场景描述标准为这种沉浸式体验提供了切入点和粘合剂。其架构背后的关键设计原则是将媒体访问与呈现分离开来。场景描述标准通过定义一个单独的媒体访问函数(MAF)和访问它的API来实现这一点。MPEG-I场景描述参考架构如图1所示。MAF从场景中引用的媒体上的表示引擎接收指令。它使用这些信息建立适当的媒体管道来获取媒体，并以所需的格式将其传递给表示引擎进行呈现。对媒体的请求还包括有关当前观看者位置以及场景摄像机位置和固有参数的信息。这使得MAF能够实现广泛的优化技术，例如根据观看者的位置和方向来调整检索到的媒体以适应网络条件。这些适应可能包括部分检索、不同细节层次的访问和质量调整。在本文中，我们描述了沉浸式媒体的架构和MAF为优化沉浸式媒体流所执行的功能。我们讨论了基于一组选定的3D内容(即视频纹理，动态网格和点云)的MPEG格式的不同适应选项。我们在沉浸式会议的例子中描述了这种适应算法的实时媒体交付的可能设计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 1st Mile-High Video Conference

自引率

0.00%

发文量