SwinGS: Sliding Window Gaussian Splatting for Volumetric Video Streaming with Arbitrary Length

arXiv - CS - Multimedia Pub Date : 2024-09-12 DOI:arxiv-2409.07759

Bangya Liu, Suman Banerjee

{"title":"SwinGS: Sliding Window Gaussian Splatting for Volumetric Video Streaming with Arbitrary Length","authors":"Bangya Liu, Suman Banerjee","doi":"arxiv-2409.07759","DOIUrl":null,"url":null,"abstract":"Recent advances in 3D Gaussian Splatting (3DGS) have garnered significant\nattention in computer vision and computer graphics due to its high rendering\nspeed and remarkable quality. While extant research has endeavored to extend\nthe application of 3DGS from static to dynamic scenes, such efforts have been\nconsistently impeded by excessive model sizes, constraints on video duration,\nand content deviation. These limitations significantly compromise the\nstreamability of dynamic 3D Gaussian models, thereby restricting their utility\nin downstream applications, including volumetric video, autonomous vehicle, and\nimmersive technologies such as virtual, augmented, and mixed reality. This paper introduces SwinGS, a novel framework for training, delivering, and\nrendering volumetric video in a real-time streaming fashion. To address the\naforementioned challenges and enhance streamability, SwinGS integrates\nspacetime Gaussian with Markov Chain Monte Carlo (MCMC) to adapt the model to\nfit various 3D scenes across frames, in the meantime employing a sliding window\ncaptures Gaussian snapshots for each frame in an accumulative way. We implement\na prototype of SwinGS and demonstrate its streamability across various datasets\nand scenes. Additionally, we develop an interactive WebGL viewer enabling\nreal-time volumetric video playback on most devices with modern browsers,\nincluding smartphones and tablets. Experimental results show that SwinGS\nreduces transmission costs by 83.6% compared to previous work with ignorable\ncompromise in PSNR. Moreover, SwinGS easily scales to long video sequences\nwithout compromising quality.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"35 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07759","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recent advances in 3D Gaussian Splatting (3DGS) have garnered significant attention in computer vision and computer graphics due to its high rendering speed and remarkable quality. While extant research has endeavored to extend the application of 3DGS from static to dynamic scenes, such efforts have been consistently impeded by excessive model sizes, constraints on video duration, and content deviation. These limitations significantly compromise the streamability of dynamic 3D Gaussian models, thereby restricting their utility in downstream applications, including volumetric video, autonomous vehicle, and immersive technologies such as virtual, augmented, and mixed reality. This paper introduces SwinGS, a novel framework for training, delivering, and rendering volumetric video in a real-time streaming fashion. To address the aforementioned challenges and enhance streamability, SwinGS integrates spacetime Gaussian with Markov Chain Monte Carlo (MCMC) to adapt the model to fit various 3D scenes across frames, in the meantime employing a sliding window captures Gaussian snapshots for each frame in an accumulative way. We implement a prototype of SwinGS and demonstrate its streamability across various datasets and scenes. Additionally, we develop an interactive WebGL viewer enabling real-time volumetric video playback on most devices with modern browsers, including smartphones and tablets. Experimental results show that SwinGS reduces transmission costs by 83.6% compared to previous work with ignorable compromise in PSNR. Moreover, SwinGS easily scales to long video sequences without compromising quality.

查看原文本刊更多论文

SwinGS：用于任意长度体积视频流的滑动窗口高斯拼接技术

三维高斯拼接技术（3DGS）因其渲染速度快、质量高而在计算机视觉和计算机图形学领域备受关注。虽然现有的研究一直在努力将 3DGS 的应用从静态场景扩展到动态场景，但模型尺寸过大、视频时长限制和内容偏差一直阻碍着这些研究的进行。这些限制大大降低了动态 3D 高斯模型的可流媒体性，从而限制了它们在下游应用中的实用性，包括体积视频、自动驾驶汽车和沉浸式技术（如虚拟现实、增强现实和混合现实）。本文介绍了 SwinGS，这是一种用于以实时流方式训练、交付和渲染体积视频的新型框架。为了应对上述挑战并提高流式传输能力，SwinGS将时空高斯模型与马尔可夫链蒙特卡罗（MCMC）相结合，以调整模型来拟合各帧的各种三维场景，同时采用滑动窗口以累积的方式捕捉每帧的高斯快照。我们实现了 SwinGS 的原型，并演示了它在各种数据集和场景中的流畅性。此外，我们还开发了一个交互式 WebGL 浏览器，可以在大多数使用现代浏览器的设备上实时播放体积视频，包括智能手机和平板电脑。实验结果表明，与之前的工作相比，SwinGS 降低了 83.6% 的传输成本，同时在 PSNR 方面也没有明显妥协。此外，SwinGS 还能在不影响质量的情况下轻松扩展到长视频序列。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Multimedia

自引率

0.00%

发文量