S. Dijkstra-Soudarissanane, S. Gunkel, A. Gabriel, Leonor Fermoselle, F. T. Haar, O. Niamut
{"title":"XR Carousel: A Visualization Tool For Volumetric Video","authors":"S. Dijkstra-Soudarissanane, S. Gunkel, A. Gabriel, Leonor Fermoselle, F. T. Haar, O. Niamut","doi":"10.1145/3458305.3478436","DOIUrl":"https://doi.org/10.1145/3458305.3478436","url":null,"abstract":"Recent years have seen a new uptake in immersive media and eXtended Reality (XR). And due to a global pandemic, computer-mediated communication over video conferencing tools became a new normal of everyday remote collaboration and virtual meetings. Social XR leverages XR technologies for remote communication and collaboration. But in order for XR to facilitate a high level of (social) presence and thus high-quality mediated social contact between users, we need high-quality 3D representation of users. One approach to providing detailed 3D user representations as new immersive media is to use point clouds or meshes, but these representation formats come with complexity on compression bitrate and processing time. In the example of virtual meetings, compression has to fulfill stringent requirements such as low latency and high quality. As the compression techniques for 3D immersive media steadily advance, it is important to be able to easily compare different compression techniques on their technical and visual merits in an easy way. The proposed demonstrator in this paper is a visualization tool that helps assessing the visual quality of a 3D representation employing various coding schemes. The complete end-to-end rendering/encoding chain can be easily assessed, allowing for subjective testing by showing the differences between the selected encoding parameters. The tool presented in this demo paper offers an improved and easy visual process for the comparison of encoders of immersive media.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132198111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph-based Network for Dynamic Point Cloud Prediction","authors":"P. Gomes","doi":"10.1145/3458305.3478463","DOIUrl":"https://doi.org/10.1145/3458305.3478463","url":null,"abstract":"Dynamic point clouds have enabled the rise of virtual reality applications. However, due to their voluminous size, point clouds require efficient compression methods. While a few articles have addressed the compression of dynamic point clouds by exploring temporal redundancies between sequential frames, very few have explored point cloud prediction as a tool for efficient compression. In this PhD thesis, we propose an end-to-end learning network to predict future frames in a point cloud sequence. To address the challenges present in point cloud processing, namely the lack of structure we propose a graph-based approach to learn topological information of point clouds as geometric features. Early results demonstrate that our method is able to make accurate predictions and can be applied in a compression algorithm.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131535664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimizing QoE and Latency of Live Video Streaming Using Edge Computing and In-Network Intelligence","authors":"A. Erfanian","doi":"10.1145/3458305.3478459","DOIUrl":"https://doi.org/10.1145/3458305.3478459","url":null,"abstract":"Live video streaming traffic and related applications have experienced significant growth in recent years. More users have started generating and delivering live streams with high quality (e.g., 4K resolution) through popular online streaming platforms such as YouTube, Twitch, and Facebook. Typically, the video contents are generated by streamers and watched by many audiences, which are geographically distributed in various locations far away from the streamers' locations. The resource limitation in the network (e.g., bandwidth) is a challenging issue for network and video providers to meet the users' requested quality. In this thesis, we will investigate optimizing QoEand end-to-end (E2E) latency of live video streaming by leveraging edge computing capabilities and in-network intelligence. We present four main research questions aiming to address the various challenges in optimizing live streaming QoE and E2E latency by employing edge computing and in-network intelligence.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132578606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"EvLag: A Tool for Monitoring and Lagging Linux Input Devices","authors":"Shengmei Liu, M. Claypool","doi":"10.1145/3458305.3478449","DOIUrl":"https://doi.org/10.1145/3458305.3478449","url":null,"abstract":"Understanding the effects of latency on interaction is important for building software, such as computer games, that perform well over a range of system configurations. Unfortunately, user studies evaluating latency must each write their own code to add latency to user input and, even worse, must limit themselves to open source applications. To address these shortcomings, this paper presents EvLag, a tool for adding latency to user input devices in Linux. EvLag provides a custom amount of latency for each device regardless of the application being run, enabling user studies for systems and software that cannot be modified (e.g., commercial games). Evaluation shows EvLag has low overhead and accurately adds the expected amount of latency to user input. In addition, EvLag can log user input events for post study analysis with several utilities provided to facilitate output event parsing.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125305452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ignacio Reimat, E. Alexiou, Jack Jansen, Irene Viola, S. Subramanyam, Pablo César
{"title":"CWIPC-SXR: Point Cloud dynamic human dataset for Social XR","authors":"Ignacio Reimat, E. Alexiou, Jack Jansen, Irene Viola, S. Subramanyam, Pablo César","doi":"10.1145/3458305.3478452","DOIUrl":"https://doi.org/10.1145/3458305.3478452","url":null,"abstract":"Real-time, immersive telecommunication systems are quickly becoming a reality, thanks to the advances in acquisition, transmission, and rendering technologies. Point clouds in particular serve as a promising representation in these type of systems, offering photorealistic rendering capabilities with low complexity. Further development of transmission, coding, and quality evaluation algorithms, though, is currently hindered by the lack of publicly available datasets that represent realistic scenarios of remote communication between people in real-time. In this paper, we release a dynamic point cloud dataset that depicts humans interacting in social XR settings. Using commodity hardware, we capture a total of 45 unique sequences, according to several use cases for social XR. As part of our release, we provide annotated raw material, resulting point cloud sequences, and an auxiliary software toolbox to acquire, process, encode, and visualize data, suitable for real-time applications. The dataset can be accessed via the following link: https://www.dis.cwi.nl/cwipc-sxr-dataset/.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"318 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116421722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Griwodz, Simone Gasparini, L. Calvet, Pierre Gurdjos, Fabien Castan, Benoit Maujean, Gregoire De Lillo, Yann Lanthony
{"title":"AliceVision Meshroom: An open-source 3D reconstruction pipeline","authors":"C. Griwodz, Simone Gasparini, L. Calvet, Pierre Gurdjos, Fabien Castan, Benoit Maujean, Gregoire De Lillo, Yann Lanthony","doi":"10.1145/3458305.3478443","DOIUrl":"https://doi.org/10.1145/3458305.3478443","url":null,"abstract":"This paper introduces the Meshroom software and its underlying 3D computer vision framework AliceVision. This solution provides a photogrammetry pipeline to reconstruct 3D scenes from a set of unordered images. It also features other pipelines for fusing multi-bracketing low dynamic range images into high dynamic range, stitching multiple images into a panorama and estimating the motion of a moving camera. Meshroom's node-graph architecture allows the user to customize the different pipelines to adjust them to their domain specific needs. The user can interactively add other processing nodes to modify a pipeline, export intermediate data to analyze the result of the algorithms and easily compare the outputs given by different sets of parameters. The software package is released in open source and relies on open file formats. These features enable researchers to conveniently run the pipelines, access and visualize the data at each step, thus promoting the sharing and the reproducibility of the results.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117250417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"REEFT-360: Real-time Emulation and Evaluation Framework for Tile-based 360 Streaming under Time-varying Conditions","authors":"Eric Lindskog, Niklas Carlsson","doi":"10.1145/3458305.3478453","DOIUrl":"https://doi.org/10.1145/3458305.3478453","url":null,"abstract":"With 360° video streaming, the user's field of view (a.k.a. viewport) is at all times determined by the user's current viewing direction. Since any two users are unlikely to look in the exact same direction as each other throughout the viewing of a video, the frame-by-frame video sequence displayed during a playback session is typically unique. This complicates the direct comparison of the perceived Quality of Experience (QoE) using popular metrics such as the Multiscale-Structural Similarity (MS-SSIM). Furthermore, there is an absence of light-weight emulation frameworks for tiled-based 360° video streaming that allow easy testing of different algorithm designs and tile sizes. To address these challenges, we present REEFT-360, which consists of (1) a real-time emulation framework that captures tile-quality adaptation under time-varying bandwidth conditions and (2) a multi-step evaluation process that allows the calculation of MS-SSIM scores and other frame-based metrics, while accounting for the user's head movements. Importantly, the framework allows speedy implementation and testing of alternative head-movement prediction and tile-based prefetching solutions, allows testing under a wide range of network conditions, and can be used either with a human user or head-movement traces. The developed software tool is shared with the paper. We also present proof-of-concept evaluation results that highlight the importance of including a human subject in the evaluation.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132409534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Content-Aware Playback Speed Control for Low-Latency Live Streaming of Sports","authors":"O. Aladag, Deniz Ugur, Mehmet N. Akcay, A. Begen","doi":"10.1145/3458305.3478437","DOIUrl":"https://doi.org/10.1145/3458305.3478437","url":null,"abstract":"There are two main factors that determine the viewer experience during the live streaming of sports content: latency and stalls. Latency should be low and stalls should not occur. Yet, these two factors work against each other and it is not trivial to strike the best trade-off between them. One of the best tools we have today to manage this trade-off is the adaptive playback speed control. This tool allows the streaming client to slow down the playback when there is a risk of stalling and increase the playback when there is no risk of stalling but the live latency is higher than desired. While adaptive playback generally works well, the artifacts due to the changes in the playback speed should preferably be unnoticeable to the viewers. However, this mostly depends on the portion of the audio/video content subject to the playback speed change. In this paper, we advance the state-of-the-art by developing a content-aware playback speed control (CAPSC) algorithm and demonstrate a number of examples showing its significance. We make the running code available and provide a demo page hoping that it will be a useful tool for the developers and content providers.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122472404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance Estimation of Encrypted Video Streaming in Light of End-User Playback-Related Interactions","authors":"Ivan Bartolec","doi":"10.1145/3458305.3478467","DOIUrl":"https://doi.org/10.1145/3458305.3478467","url":null,"abstract":"Our research will look into realistic end-user service usage behavior patterns and their corresponding implications on the in-network Quality of Experience (QoE) monitoring for HTTP adaptive video streaming (HAS) services in wireless and mobile networks. The main goal is to establish a methodology for developing and testing machine learning (ML) models for estimating end-user QoE-related Key Performance Indicators (KPIs) in the context of user-initiated playback interactions. The initial phase will be to investigate user behavior when utilizing video streaming services on mobile devices and propose a user interaction model. In addition, a methodology for automated data collecting, processing, and analysis will be created, which will include the creation of a framework that combines user interaction simulation based on the proposed model. Extensive experiments will be carried out to train ML models for KPI estimation, and the resultant KPI estimation models will be evaluated. This paper presents a current state-of-the-art review of the corresponding topics, as well as the current state of our research and preliminary findings.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117287440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rising Cellular Multimedia IoT: the Call for an Application-Aware Resource Management","authors":"Paniz Parastar","doi":"10.1145/3458305.3478462","DOIUrl":"https://doi.org/10.1145/3458305.3478462","url":null,"abstract":"Traditional Mobile broadband (MBB) will not have the lion's share of cellular network anymore with the billion things connected to the Internet. The design of the network has to change to adopt the demands of various applications, and especially Internet of Things (IoT). Moreover, multimedia IoT (M-IoT) applications with massive data volume and different requirements than both IoT and MBB are new challenges. In this paper, we remark some points that indicate we need an efficient network management system to deal with various M-IoT applications in the future cellular networks. We also present connected cars as a case study and investigate their characteristics and demands.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"233 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121142214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}