Proceedings of the 14th Conference on ACM Multimedia Systems最新文献

Adaptive streaming of 3D content for web-based virtual reality: an open-source prototype including several metrics and strategies 基于网络的虚拟现实的自适应3D内容流:一个包括几个指标和策略的开源原型

Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-06-07 DOI: 10.1145/3587819.3592555

Jean-Philippe Farrugia, Luc Billaud, G. Lavoué

{"title":"Adaptive streaming of 3D content for web-based virtual reality: an open-source prototype including several metrics and strategies","authors":"Jean-Philippe Farrugia, Luc Billaud, G. Lavoué","doi":"10.1145/3587819.3592555","DOIUrl":"https://doi.org/10.1145/3587819.3592555","url":null,"abstract":"Virtual reality is a new technology that has been developing a lot during the last decade. With autonomous head-mounted displays appearing on the market, new uses and needs have been created. The 3D content displayed by those devices can now be stored on distant servers rather than directly in the device's memory. In such networked immersive experiences, the 3D environment has to be streamed in real-time to the headset. In that context, several recent papers proposed utility metrics and selection strategies to schedule the streaming of the different objects composing the 3D environment, in order to minimize the latency and to optimize the quality of what is being visualized by the user at each moment. However, these proposed frameworks are hardly comparable since they operate on different systems and data. Therefore, we hereby propose an open-source DASH-based web framework for adaptive streaming of 3D content in a 6 Degrees of Freedom (DoFs) scenario. Our framework integrates several strategies and utility metrics from the state of the art, as well as several relevant features: 3D graphics compression, levels of details and the use of a visual quality index. We used our software to demonstrate the relevance of those tools and provide useful hints for the community for the further improvements of 3D streaming systems.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123272694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

360 Video DASH Dataset 360视频DASH数据集

Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-06-07 DOI: 10.1145/3587819.3592548

Darijo Raca, Yogita Jadhav, Jason J. Quinlan, A. Zahran

引用次数: 0

A 6DoF VR Dataset of 3D virtualWorld for Privacy-Preserving Approach and Utility-Privacy Tradeoff 基于隐私保护方法和效用-隐私权衡的3D虚拟世界6DoF VR数据集

Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-06-07 DOI: 10.1145/3587819.3592557

Yu-Szu Wei, Xing Wei, Shin-Yi Zheng, Cheng-Hsin Hsu, Chenyang Yang

引用次数: 0

Learning to Predict Head Pose in Remotely-Rendered Virtual Reality 学习预测头部姿势在远程渲染的虚拟现实

Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-06-07 DOI: 10.1145/3587819.3590972

G. Illahi, Ashutosh Vaishnav, Teemu Kämäräinen, M. Siekkinen, Mario Di Francesco

{"title":"Learning to Predict Head Pose in Remotely-Rendered Virtual Reality","authors":"G. Illahi, Ashutosh Vaishnav, Teemu Kämäräinen, M. Siekkinen, Mario Di Francesco","doi":"10.1145/3587819.3590972","DOIUrl":"https://doi.org/10.1145/3587819.3590972","url":null,"abstract":"Accurate characterization of Head Mounted Display (HMD) pose in a virtual scene is essential for rendering immersive graphics in Extended Reality (XR). Remote rendering employs servers in the cloud or at the edge of the network to overcome the computational limitations of either standalone or tethered HMDs. Unfortunately, it increases the latency experienced by the user; for this reason, predicting HMD pose in advance is highly beneficial, as long as it achieves high accuracy. This work provides a thorough characterization of solutions that forecast HMD pose in remotely-rendered virtual reality (VR) by considering six degrees of freedom. Specifically, it provides an extensive evaluation of pose representations, forecasting methods, machine learning models, and the use of multiple modalities along with joint and separate training. In particular, a novel three-point representation of pose is introduced together with a data fusion scheme for long-term short-term memory (LSTM) neural networks. Our findings show that machine learning models benefit from using multiple modalities, even though simple statistical models perform surprisingly well. Moreover, joint training is comparable to separate training with carefully chosen pose representation and data fusion strategies.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124639931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Semi-automatic mulsemedia authoring analysis from the user's perspective 从用户角度进行半自动多媒体创作分析

Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-06-07 DOI: 10.1145/3587819.3590979

R. Abreu, D. Mattos, J. Santos, George Guinea, D. Muchaluat-Saade

引用次数: 0

Security-Preserving Live 3D Video Surveillance 安全保护实时3D视频监控

Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-06-07 DOI: 10.1145/3587819.3590975

Zhongze Tang, Huy Phan, Xianglong Feng, Bo Yuan, Yao Liu, Sheng Wei

{"title":"Security-Preserving Live 3D Video Surveillance","authors":"Zhongze Tang, Huy Phan, Xianglong Feng, Bo Yuan, Yao Liu, Sheng Wei","doi":"10.1145/3587819.3590975","DOIUrl":"https://doi.org/10.1145/3587819.3590975","url":null,"abstract":"3D video surveillance has become the new trend in security monitoring with the popularity of 3D depth cameras in the consumer market. While enabling more fruitful surveillance features, the finer-grained 3D videos being captured would raise new security concerns that have not been addressed by existing research. This paper explores the security implications of live 3D surveillance videos in triggering biometrics-related attacks, such as face ID spoofing. We demonstrate that the state-of-the-art face authentication systems can be effectively compromised by the 3D face models presented in the surveillance video. Then, to defend against such face spoofing attacks, we propose to proactively and benignly inject adversarial perturbations to the surveillance video in real time, prior to the exposure to potential adversaries. Such dynamically generated perturbations can prevent the face models from being exploited to bypass deep learning-based face authentications while maintaining the required quality and functionality of the 3D video surveillance. We evaluate the proposed perturbation generation approach on both an RGB-D dataset and a 3D video dataset, which justifies its effective security protection, low quality degradation, and real-time performance.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123607223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The World is Too Big to Download: 3D Model Retrieval for World-Scale Augmented Reality 世界太大了，无法下载:世界规模增强现实的3D模型检索

Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-06-07 DOI: 10.1145/3587819.3590970

Yi-Zhen Tsai, James Luo, Yunshu Wang, Jiasi Chen

{"title":"The World is Too Big to Download: 3D Model Retrieval for World-Scale Augmented Reality","authors":"Yi-Zhen Tsai, James Luo, Yunshu Wang, Jiasi Chen","doi":"10.1145/3587819.3590970","DOIUrl":"https://doi.org/10.1145/3587819.3590970","url":null,"abstract":"World-scale augmented reality (AR) is a form of AR where users move around the real world, viewing and interacting with 3D models at specific locations. However, given the geographical scale of world-scale AR, pre-fetching and storing numerous high-quality 3D models locally on the device is infeasible. For example, it would be impossible to download and store 3D ads from all the storefronts in a city onto a single device. A key challenge is thus deciding which remotely-stored 3D models should be fetched onto the AR device from an edge server, in order to render them in a timely fashion - yet with high visual quality - on the display. In this work, we propose a 3D model retrieval framework that makes intelligent decisions of which quality of 3D models to fetch, and when. The optimization decision is based on quality-compression tradeoffs, network bandwidth, and predictions of which 3D models the AR user is likely to view next. To support our framework, we collect real-world traces of AR users playing a world-scale AR game, and use this to drive our simulation and prediction modules. Our results show that the proposed framework can achieve higher visual quality of the 3D models while missing fewer display deadlines (by 20%) and wasting fewer bytes (by 10x), compared to a baseline approach of pre-fetching models within a fixed distance of the user.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"201 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130444666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Open-Source Toolkit for Live End-to-End 4K VVC Intra Coding 实时端到端4K VVC内部编码的开源工具包

Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-06-07 DOI: 10.1145/3587819.3593938

Marko Viitanen, Joose Sainio, Alexandre Mercat, Guillaume Gautier, Jarno Vanne, Ibrahim Farhat, Pierre-Loup Cabarat, W. Hamidouche, D. Ménard

{"title":"Open-Source Toolkit for Live End-to-End 4K VVC Intra Coding","authors":"Marko Viitanen, Joose Sainio, Alexandre Mercat, Guillaume Gautier, Jarno Vanne, Ibrahim Farhat, Pierre-Loup Cabarat, W. Hamidouche, D. Ménard","doi":"10.1145/3587819.3593938","DOIUrl":"https://doi.org/10.1145/3587819.3593938","url":null,"abstract":"Versatile Video Coding (VVC/H.266) takes video coding to the next level by doubling the coding efficiency over its predecessors for the same subjective quality, but at the cost of immense coding complexity. Therefore, VVC calls for aggressively optimized codecs to make it feasible for live streaming media applications. This paper introduces the first public end-to-end (E2E) pipeline for live 4K30p VVC intra coding and streaming. The pipeline is made up of three open-source components: 1) uvg266 for VVC encoding; 2) uvgRTP for VVC streaming; and 3) OpenVVC for VVC decoding. The proposed setup is demonstrated with a proof-of-concept prototype that implements the encoder end on AMD ThreadRipper 2990WX and the decoder end on Nvidia Jetson AGX Orin. Our prototype is almost 34 000 times as fast as the corresponding E2E pipeline built around the VTM codec. Respectively, it achieves 3.3 times speedup without any significant coding overhead over the pipeline that utilizes the fastest possible configuration of the well-known VVenC/VVdeC codec. These results indicate that our prototype is currently the only viable open-source solution for live 4K VVC intra coding and streaming.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124568457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

patchVVC: A Real-time Compression Framework for Streaming Volumetric Videos patchVVC:一个实时压缩框架的流容量视频

Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-06-07 DOI: 10.1145/3587819.3590983

Ru Chen, Mengbai Xiao, Dongxiao Yu, Guanghui Zhang, Yao Liu

{"title":"patchVVC: A Real-time Compression Framework for Streaming Volumetric Videos","authors":"Ru Chen, Mengbai Xiao, Dongxiao Yu, Guanghui Zhang, Yao Liu","doi":"10.1145/3587819.3590983","DOIUrl":"https://doi.org/10.1145/3587819.3590983","url":null,"abstract":"Nowadays, volumetric video has emerged as an attractive multimedia application, which provides highly immersive watching experiences. However, streaming the volumetric video demands prohibitively high bandwidth. Thus, effectively compressing its underlying point cloud frames is essential to deploying the volumetric videos. The existing compression techniques are either 3D-based or 2D-based, but they still have drawbacks when being deployed in practice. The 2D-based methods compress the videos in an effective but slow manner, while the 3D-based methods feature high coding speeds but low compression ratios. In this paper, we propose patchVVC, a 3D-based compression framework that reaches both a high compression ratio and a real-time decoding speed. More importantly, patchVVC is designed based on point cloud patches, which makes it friendly to an field of view adaptive streaming system that further reduces the bandwidth demands. The evaluation shows patchVCC achieves the real-time decoding speed and the comparable compression ratios as the representative 2D-based scheme, V-PCC, in an FoV-adaptive streaming scenario.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114503506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enabling Low Bit-Rate MPEG V-PCC-encoded Volumetric Video Streaming with 3D Sub-sampling 启用具有3D子采样的低比特率MPEG v - pc编码体积视频流

Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-06-07 DOI: 10.1145/3587819.3590981

Yuang Shi, Pranav Venkatram, Yifan Ding, Wei Tsang Ooi

{"title":"Enabling Low Bit-Rate MPEG V-PCC-encoded Volumetric Video Streaming with 3D Sub-sampling","authors":"Yuang Shi, Pranav Venkatram, Yifan Ding, Wei Tsang Ooi","doi":"10.1145/3587819.3590981","DOIUrl":"https://doi.org/10.1145/3587819.3590981","url":null,"abstract":"MPEG's Video-based Point Cloud Compression (V-PCC) is a recent new standard for volumetric video compression. By mapping a 3D dynamic point cloud to a 2D image sequence, V-PCC can rely on state-of-the-art video codecs to achieve high compression rate while maintaining the visual fidelity of the point cloud sequence. The quality of a compressed point cloud degrades steeply, however, below the operational bit-rate range of the video codec. In this work, we show that redundant information inherent in a 3D point cloud can be exploited to further extend the bit-rate range of the V-PCC codec, enabling it to operate in a low bit-rate scenario that is important in the context of volumetric video streaming. By simplifying the 3D point clouds through down-sampling and down-scaling during the encoding phase, and reversing the process during the decoding phase, we show that V-PCC could achieve up to 2.1 dB improvement in peak signal-to-noise ratio (PSNR), 7.1% improvement in structural similarity index (SSIM) and 14.8 improvement in video multimethod assessment fusion (VMAF) of the rendered point clouds at the same bit-rate and correspondingly up to 48.5% lower bit-rate at the same image quality.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130720514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2