{"title":"Adaptive streaming of 3D content for web-based virtual reality: an open-source prototype including several metrics and strategies","authors":"Jean-Philippe Farrugia, Luc Billaud, G. Lavoué","doi":"10.1145/3587819.3592555","DOIUrl":"https://doi.org/10.1145/3587819.3592555","url":null,"abstract":"Virtual reality is a new technology that has been developing a lot during the last decade. With autonomous head-mounted displays appearing on the market, new uses and needs have been created. The 3D content displayed by those devices can now be stored on distant servers rather than directly in the device's memory. In such networked immersive experiences, the 3D environment has to be streamed in real-time to the headset. In that context, several recent papers proposed utility metrics and selection strategies to schedule the streaming of the different objects composing the 3D environment, in order to minimize the latency and to optimize the quality of what is being visualized by the user at each moment. However, these proposed frameworks are hardly comparable since they operate on different systems and data. Therefore, we hereby propose an open-source DASH-based web framework for adaptive streaming of 3D content in a 6 Degrees of Freedom (DoFs) scenario. Our framework integrates several strategies and utility metrics from the state of the art, as well as several relevant features: 3D graphics compression, levels of details and the use of a visual quality index. We used our software to demonstrate the relevance of those tools and provide useful hints for the community for the further improvements of 3D streaming systems.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123272694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Darijo Raca, Yogita Jadhav, Jason J. Quinlan, A. Zahran
{"title":"360 Video DASH Dataset","authors":"Darijo Raca, Yogita Jadhav, Jason J. Quinlan, A. Zahran","doi":"10.1145/3587819.3592548","DOIUrl":"https://doi.org/10.1145/3587819.3592548","url":null,"abstract":"Different industries are observing the positive impact of 360 video on the user experience. However, the performance of VR systems continues to fall short of customer expectations. Therefore, more research into various design elements for VR streaming systems is required. This study introduces a SW tool that offers straight-forward encoding platforms to simplify the encoding of DASH VR videos. In addition, we developed a dataset composed of 9 VR videos encoded with seven tiling configurations, four segment durations, and up to four different bitrates. A corresponding tile size dataset is also provided, which can be utilised to power network simulations or trace-driven emulations. We analysed the traffic load of various films and encoding setups using the dataset that was presented. Our research indicates that, while smaller tile sizes reduce traffic load, video decoding may require more computational power.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126728274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu-Szu Wei, Xing Wei, Shin-Yi Zheng, Cheng-Hsin Hsu, Chenyang Yang
{"title":"A 6DoF VR Dataset of 3D virtualWorld for Privacy-Preserving Approach and Utility-Privacy Tradeoff","authors":"Yu-Szu Wei, Xing Wei, Shin-Yi Zheng, Cheng-Hsin Hsu, Chenyang Yang","doi":"10.1145/3587819.3592557","DOIUrl":"https://doi.org/10.1145/3587819.3592557","url":null,"abstract":"Virtual Reality (VR) applications offer an immersive user experience at the expense of privacy leakage caused by inevitably streaming various new types of user data. While some privacy-preserving approaches have been proposed for protecting one type of data, how to design and evaluate approaches for multiple types of user data are still open. On the other hand, preserving privacy will degrade the quality of experience of VR applications or say the utility of user data. How to achieve efficient utility-privacy tradeoff with multiple types of data is also open. Both call for a dataset that contains multiple types of user data and personal attributes of users as ground-truth values. In this paper, we collect a 6 degree-of-freedom VR dataset of 3D virtual worlds for the investigation of privacy-preserving approaches and utility-privacy tradeoff.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122192203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Illahi, Ashutosh Vaishnav, Teemu Kämäräinen, M. Siekkinen, Mario Di Francesco
{"title":"Learning to Predict Head Pose in Remotely-Rendered Virtual Reality","authors":"G. Illahi, Ashutosh Vaishnav, Teemu Kämäräinen, M. Siekkinen, Mario Di Francesco","doi":"10.1145/3587819.3590972","DOIUrl":"https://doi.org/10.1145/3587819.3590972","url":null,"abstract":"Accurate characterization of Head Mounted Display (HMD) pose in a virtual scene is essential for rendering immersive graphics in Extended Reality (XR). Remote rendering employs servers in the cloud or at the edge of the network to overcome the computational limitations of either standalone or tethered HMDs. Unfortunately, it increases the latency experienced by the user; for this reason, predicting HMD pose in advance is highly beneficial, as long as it achieves high accuracy. This work provides a thorough characterization of solutions that forecast HMD pose in remotely-rendered virtual reality (VR) by considering six degrees of freedom. Specifically, it provides an extensive evaluation of pose representations, forecasting methods, machine learning models, and the use of multiple modalities along with joint and separate training. In particular, a novel three-point representation of pose is introduced together with a data fusion scheme for long-term short-term memory (LSTM) neural networks. Our findings show that machine learning models benefit from using multiple modalities, even though simple statistical models perform surprisingly well. Moreover, joint training is comparable to separate training with carefully chosen pose representation and data fusion strategies.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124639931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Abreu, D. Mattos, J. Santos, George Guinea, D. Muchaluat-Saade
{"title":"Semi-automatic mulsemedia authoring analysis from the user's perspective","authors":"R. Abreu, D. Mattos, J. Santos, George Guinea, D. Muchaluat-Saade","doi":"10.1145/3587819.3590979","DOIUrl":"https://doi.org/10.1145/3587819.3590979","url":null,"abstract":"Mulsemedia (Multiple Sensorial Media) authoring is a complex task that requires the author to scan the media content to identify the moments to activate sensory effects. A novel proposal is to integrate content recognition algorithms into authoring tools to alleviate the authoring effort. Such algorithms could potentially replace the work of the human author when analyzing audiovisual content, by performing automatic extraction of sensory effects. Besides that, the semi-automatic method proposes to maintain the author subjectivity, allowing the author to define which sensory effects should be automatically extracted. This paper presents an evaluation of the proposed semi-automatic authoring considering the point of view of users. Experiments were done with the STEVE 2.0 mulsemedia authoring tool. Our work uses the GQM (Goal Question Metric) methodology, a questionnaire for collecting users' feedback, and analyzes the results. We conclude that users believe that the semi-automatic authoring is a positive addition to the authoring method.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129133256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhongze Tang, Huy Phan, Xianglong Feng, Bo Yuan, Yao Liu, Sheng Wei
{"title":"Security-Preserving Live 3D Video Surveillance","authors":"Zhongze Tang, Huy Phan, Xianglong Feng, Bo Yuan, Yao Liu, Sheng Wei","doi":"10.1145/3587819.3590975","DOIUrl":"https://doi.org/10.1145/3587819.3590975","url":null,"abstract":"3D video surveillance has become the new trend in security monitoring with the popularity of 3D depth cameras in the consumer market. While enabling more fruitful surveillance features, the finer-grained 3D videos being captured would raise new security concerns that have not been addressed by existing research. This paper explores the security implications of live 3D surveillance videos in triggering biometrics-related attacks, such as face ID spoofing. We demonstrate that the state-of-the-art face authentication systems can be effectively compromised by the 3D face models presented in the surveillance video. Then, to defend against such face spoofing attacks, we propose to proactively and benignly inject adversarial perturbations to the surveillance video in real time, prior to the exposure to potential adversaries. Such dynamically generated perturbations can prevent the face models from being exploited to bypass deep learning-based face authentications while maintaining the required quality and functionality of the 3D video surveillance. We evaluate the proposed perturbation generation approach on both an RGB-D dataset and a 3D video dataset, which justifies its effective security protection, low quality degradation, and real-time performance.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123607223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The World is Too Big to Download: 3D Model Retrieval for World-Scale Augmented Reality","authors":"Yi-Zhen Tsai, James Luo, Yunshu Wang, Jiasi Chen","doi":"10.1145/3587819.3590970","DOIUrl":"https://doi.org/10.1145/3587819.3590970","url":null,"abstract":"World-scale augmented reality (AR) is a form of AR where users move around the real world, viewing and interacting with 3D models at specific locations. However, given the geographical scale of world-scale AR, pre-fetching and storing numerous high-quality 3D models locally on the device is infeasible. For example, it would be impossible to download and store 3D ads from all the storefronts in a city onto a single device. A key challenge is thus deciding which remotely-stored 3D models should be fetched onto the AR device from an edge server, in order to render them in a timely fashion - yet with high visual quality - on the display. In this work, we propose a 3D model retrieval framework that makes intelligent decisions of which quality of 3D models to fetch, and when. The optimization decision is based on quality-compression tradeoffs, network bandwidth, and predictions of which 3D models the AR user is likely to view next. To support our framework, we collect real-world traces of AR users playing a world-scale AR game, and use this to drive our simulation and prediction modules. Our results show that the proposed framework can achieve higher visual quality of the 3D models while missing fewer display deadlines (by 20%) and wasting fewer bytes (by 10x), compared to a baseline approach of pre-fetching models within a fixed distance of the user.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"201 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130444666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marko Viitanen, Joose Sainio, Alexandre Mercat, Guillaume Gautier, Jarno Vanne, Ibrahim Farhat, Pierre-Loup Cabarat, W. Hamidouche, D. Ménard
{"title":"Open-Source Toolkit for Live End-to-End 4K VVC Intra Coding","authors":"Marko Viitanen, Joose Sainio, Alexandre Mercat, Guillaume Gautier, Jarno Vanne, Ibrahim Farhat, Pierre-Loup Cabarat, W. Hamidouche, D. Ménard","doi":"10.1145/3587819.3593938","DOIUrl":"https://doi.org/10.1145/3587819.3593938","url":null,"abstract":"Versatile Video Coding (VVC/H.266) takes video coding to the next level by doubling the coding efficiency over its predecessors for the same subjective quality, but at the cost of immense coding complexity. Therefore, VVC calls for aggressively optimized codecs to make it feasible for live streaming media applications. This paper introduces the first public end-to-end (E2E) pipeline for live 4K30p VVC intra coding and streaming. The pipeline is made up of three open-source components: 1) uvg266 for VVC encoding; 2) uvgRTP for VVC streaming; and 3) OpenVVC for VVC decoding. The proposed setup is demonstrated with a proof-of-concept prototype that implements the encoder end on AMD ThreadRipper 2990WX and the decoder end on Nvidia Jetson AGX Orin. Our prototype is almost 34 000 times as fast as the corresponding E2E pipeline built around the VTM codec. Respectively, it achieves 3.3 times speedup without any significant coding overhead over the pipeline that utilizes the fastest possible configuration of the well-known VVenC/VVdeC codec. These results indicate that our prototype is currently the only viable open-source solution for live 4K VVC intra coding and streaming.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124568457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ru Chen, Mengbai Xiao, Dongxiao Yu, Guanghui Zhang, Yao Liu
{"title":"patchVVC: A Real-time Compression Framework for Streaming Volumetric Videos","authors":"Ru Chen, Mengbai Xiao, Dongxiao Yu, Guanghui Zhang, Yao Liu","doi":"10.1145/3587819.3590983","DOIUrl":"https://doi.org/10.1145/3587819.3590983","url":null,"abstract":"Nowadays, volumetric video has emerged as an attractive multimedia application, which provides highly immersive watching experiences. However, streaming the volumetric video demands prohibitively high bandwidth. Thus, effectively compressing its underlying point cloud frames is essential to deploying the volumetric videos. The existing compression techniques are either 3D-based or 2D-based, but they still have drawbacks when being deployed in practice. The 2D-based methods compress the videos in an effective but slow manner, while the 3D-based methods feature high coding speeds but low compression ratios. In this paper, we propose patchVVC, a 3D-based compression framework that reaches both a high compression ratio and a real-time decoding speed. More importantly, patchVVC is designed based on point cloud patches, which makes it friendly to an field of view adaptive streaming system that further reduces the bandwidth demands. The evaluation shows patchVCC achieves the real-time decoding speed and the comparable compression ratios as the representative 2D-based scheme, V-PCC, in an FoV-adaptive streaming scenario.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114503506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enabling Low Bit-Rate MPEG V-PCC-encoded Volumetric Video Streaming with 3D Sub-sampling","authors":"Yuang Shi, Pranav Venkatram, Yifan Ding, Wei Tsang Ooi","doi":"10.1145/3587819.3590981","DOIUrl":"https://doi.org/10.1145/3587819.3590981","url":null,"abstract":"MPEG's Video-based Point Cloud Compression (V-PCC) is a recent new standard for volumetric video compression. By mapping a 3D dynamic point cloud to a 2D image sequence, V-PCC can rely on state-of-the-art video codecs to achieve high compression rate while maintaining the visual fidelity of the point cloud sequence. The quality of a compressed point cloud degrades steeply, however, below the operational bit-rate range of the video codec. In this work, we show that redundant information inherent in a 3D point cloud can be exploited to further extend the bit-rate range of the V-PCC codec, enabling it to operate in a low bit-rate scenario that is important in the context of volumetric video streaming. By simplifying the 3D point clouds through down-sampling and down-scaling during the encoding phase, and reversing the process during the decoding phase, we show that V-PCC could achieve up to 2.1 dB improvement in peak signal-to-noise ratio (PSNR), 7.1% improvement in structural similarity index (SSIM) and 14.8 improvement in video multimethod assessment fusion (VMAF) of the rendered point clouds at the same bit-rate and correspondingly up to 48.5% lower bit-rate at the same image quality.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130720514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}