{"title":"LiveROI","authors":"Xianglong Feng, Weitian Li, Sheng Wei","doi":"10.1145/3458305.3463378","DOIUrl":"https://doi.org/10.1145/3458305.3463378","url":null,"abstract":"Virtual reality (VR) streaming can provide immersive video viewing experience to the end users but with huge bandwidth consumption. Recent research has adopted selective streaming to address the bandwidth challenge, which predicts and streams the user's viewport of interest with high quality and the other portions of the video with low quality. However, the existing viewport prediction mechanisms mainly target the video-on-demand (VOD) scenario relying on historical video and user trace data to build the prediction model. The community still lacks an effective viewport prediction approach to support live VR streaming, the most engaging and popular VR streaming experience. We develop a region of interest (ROI)-based viewport prediction approach, namely LiveROI, for live VR streaming. LiveROI employs an action recognition algorithm to analyze the video content and uses the analysis results as the basis of viewport prediction. To eliminate the need of historical video/user data, LiveROI employs adaptive user preference modeling and word embedding to dynamically select the video viewport at runtime based on the user head orientation. We evaluate LiveROI with 12 VR videos viewed by 48 users obtained from a public VR head movement dataset. The results show that LiveROI achieves high prediction accuracy and significant bandwidth savings with real-time processing to support live VR streaming.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123567767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manasvini Sethuraman, Anirudh Sarma, Ashutosh Dhekne, U. Ramachandran
{"title":"Foresight: planning for spatial and temporal variations in bandwidth for streaming services on mobile devices","authors":"Manasvini Sethuraman, Anirudh Sarma, Ashutosh Dhekne, U. Ramachandran","doi":"10.1145/3458305.3463384","DOIUrl":"https://doi.org/10.1145/3458305.3463384","url":null,"abstract":"Spatiotemporal variation in cellular bandwidth availability is well-known and could affect a mobile user's quality of experience (QoE), especially while using bandwidth intensive streaming applications such as movies, podcasts, and music videos during commute. If such variations are made available to a streaming service in advance it could perhaps plan better to avoid sub-optimal performance while the user travels through regions of low bandwidth availability. The intuition is that such future knowledge could be used to buffer additional content in regions of higher bandwidth availability to tide over the deficits in regions of low bandwidth availability. Foresight is a service designed to provide this future knowledge for client apps running on a mobile device. It comprises three components: (a) a crowd-sourced bandwidth estimate reporting facility, (b) an on-cloud bandwidth service that records the spatiotemporal variations in bandwidth and serves queries for bandwidth availability from mobile users, and (c) an on-device bandwidth manager that caters to the bandwidth requirements from client apps by providing them with bandwidth allocation schedules. Foresight is implemented in the Android framework. As a proof of concept for using this service, we have modified an open-source video player---Exoplayer---to use the results of Foresight in its video buffer management. Our performance evaluation shows Foresight's scalability. We also showcase the opportunity that Foresight offers to ExoPlayer to enhance video quality of experience (QoE) despite spatiotemporal bandwidth variations for metrics such as overall higher bitrate of playback, reduction in number of bitrate switches, and reduction in the number of stalls during video playback.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125565808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanyuan Qin, Chinmaey Shende, Cheonjin Park, S. Sen, Bing Wang
{"title":"DataPlanner","authors":"Yanyuan Qin, Chinmaey Shende, Cheonjin Park, S. Sen, Bing Wang","doi":"10.1145/3458305.3459596","DOIUrl":"https://doi.org/10.1145/3458305.3459596","url":null,"abstract":"Over-the-top video (OTT) streaming accounts for the majority of traffic on cellular networks, and also places a heavy demand on users' limited monthly cellular data budgets. In contrast to much of traditional research that focuses on improving the quality, we explore a different direction---using data budget information to better manage the data usage of mobile video streaming, while minimizing the impact on users' quality of experience (QoE). Specifically, we propose a novel framework for quality-aware Adaptive Bitrate (ABR) streaming involving a per-session data budget constraint. Under the framework, we develop two planning based strategies, one for the case where fine-grained perceptual quality information is known to the planning scheme, and another for the case where such information is not available. Evaluations for a wide range of network conditions, using different videos covering a variety of content types and encodings, demonstrate that both these strategies use much less data compared to state-of-the-art ABR schemes, while still providing comparable QoE. Our proposed approach is designed to work in conjunction with existing ABR streaming workflows, enabling ease of adoption.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123795834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"EScALation: a framework for efficient and scalable spatio-temporal action localization","authors":"Bo Chen, K. Nahrstedt","doi":"10.1145/3458305.3459598","DOIUrl":"https://doi.org/10.1145/3458305.3459598","url":null,"abstract":"Spatio-temporal action localization aims to detect the spatial location and the start/end time of the action in a video. The state-of-the-art approach uses convolutional neural networks to extract possible bounding boxes for the action in each frame and then link bounding boxes into action tubes based on the location and the class-specific score of each bounding box. Though this approach has been successful at achieving a good localization accuracy, it is computation-intensive. High-end GPUs are usually demanded for it to achieve real-time performance. In addition, this approach does not scale well on a large number of action classes. In this work, we present a framework, EScALation, for making spatio-temporal action localization efficient and scalable. Our framework involves two main strategies. One is the frame sampling technique that utilizes the temporal correlation between frames and selects key frame(s) from a temporally correlated set of frames to perform bounding box detection. The other is the class filtering technique that exploits bounding box information to predict the action class prior to linking bounding boxes. We compare EScALation with the state-of-the-art approach on UCF101-24 and J-HMDB-21 datasets. One of our experiments shows EScALation is able to save 72.2% of the time with only 6.1% loss of mAP. In addition, we show that EScALation scales better to a large number of action classes than the state-of-the-art approach.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116459728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Livelyzer","authors":"Xiao Zhu, S. Sen, Z. Morley Mao","doi":"10.1145/3458305.3463375","DOIUrl":"https://doi.org/10.1145/3458305.3463375","url":null,"abstract":"Over-the-top (OTT) live video traffic has grown significantly, fueled by fundamental shifts in how users consume video content (e.g., increased cord-cutting) and by improvements in camera technologies, computing power, and wireless resources. A key determining factor for the end-to-end live streaming QoE is the design of the first-mile upstream ingest path that captures and transmits the live content in real-time, from the broadcaster to the remote video server. This path often involves either a Wi-Fi or cellular component, and is likely to be bandwidth-constrained with time-varying capacity, making the task of high-quality video delivery challenging. Today, there is little understanding of the state of the art in the design of this critical path, with existing research focused mainly on the downstream distribution path, from the video server to end viewers. To shed more light on the first-mile ingest aspect of live streaming, we propose Livelyzer, a generalized active measurement and black-box testing framework for analyzing the performance of this component in popular live streaming software and services under controlled settings. We use Livelyzer to characterize the ingest behavior and performance of several live streaming platforms, identify design deficiencies that lead to poor performance, and propose best practice design recommendations to improve the same.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114996486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Miao Zhang, Fangxin Wang, Yifei Zhu, Jiangchuan Liu, Zhi Wang
{"title":"Towards cloud-edge collaborative online video analytics with fine-grained serverless pipelines","authors":"Miao Zhang, Fangxin Wang, Yifei Zhu, Jiangchuan Liu, Zhi Wang","doi":"10.1145/3458305.3463377","DOIUrl":"https://doi.org/10.1145/3458305.3463377","url":null,"abstract":"The ever-growing deployment scale of surveillance cameras and the users' increasing appetite for real-time queries have urged online video analytics. Synergizing the virtually unlimited cloud resources with agile edge processing would deliver an ideal online video analytics system; yet, given the complex interaction and dependency within and across video query pipelines, it is easier said than done. This paper starts with a measurement study to acquire a deep understanding of video query pipelines on real-world camera streams. We identify the potentials and practical challenges towards cloud-edge collaborative video analytics. We then argue that the newly emerged serverless computing paradigm is the key to achieve fine-grained resource partitioning with minimum dependency. We accordingly propose CEVAS, a Cloud-Edge collaborative Video Analytics system empowered by fine-grained Serverless pipelines. It builds flexible serverless-based infrastructures to facilitate fine-grained and adaptive partitioning of cloud-edge workloads for multiple concurrent query pipelines. With the optimized design of individual modules and their integration, CEVAS achieves real-time responses to highly dynamic input workloads. We have developed a prototype of CEVAS over Amazon Web Services (AWS) and conducted extensive experiments with real-world video streams and queries. The results show that by judiciously coordinating the fine-grained serverless resources in the cloud and at the edge, CEVAS reduces 86.9% cloud expenditure and 74.4% data transfer overhead of a pure cloud scheme and improves the analysis throughput of a pure edge scheme by up to 20.6%. Thanks to the fine-grained video content-aware forecasting, CEVAS is also more adaptive than the state-of-the-art cloud-edge collaborative scheme.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121568591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrew C. Freeman, Christopher P. Burgess, Ketan Mayer-Patel
{"title":"Motion segmentation and tracking for integrating event cameras","authors":"Andrew C. Freeman, Christopher P. Burgess, Ketan Mayer-Patel","doi":"10.1145/3458305.3463373","DOIUrl":"https://doi.org/10.1145/3458305.3463373","url":null,"abstract":"Integrating event cameras are asynchronous sensors wherein incident light values may be measured directly through continuous integration, with individual pixels' light sensitivity being adjustable in real time, allowing for extremely high frame rate and high dynamic range video capture. This paper builds on lessons learned with previous attempts to compress event data and presents a new scheme for event compression that has many analogues to traditional framed video compression techniques. We show how traditional video can be transcoded to an event-based representation, and describe the direct encoding of motion data in our event-based representation. Finally, we present experimental results proving how our simple scheme already approaches the state-of-the-art compression performance for slow-motion object tracking. This system introduces an application \"in the loop\" framework, where the application dynamically informs the camera how sensitive each pixel should be, based on the efficacy of the most recent data received.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123724759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CrossRoI","authors":"Hongpeng Guo, Shuochao Yao, Zhe Yang, Qian Zhou, Klara Nahrstedt","doi":"10.1145/3458305.3463381","DOIUrl":"https://doi.org/10.1145/3458305.3463381","url":null,"abstract":"Video cameras are pervasively deployed in city scale for public good or community safety (i.e. traffic monitoring or suspected person tracking). However, analyzing large scale video feeds in real time is data intensive and poses severe challenges to today's network and computation systems. We present CrossRoI, a resource-efficient system that enables real time video analytics at scale via harnessing the videos content associations and redundancy across a fleet of cameras. CrossRoI exploits the intrinsic physical correlations of cross-camera viewing fields to drastically reduce the communication and computation costs. CrossRoI removes the repentant appearances of same objects in multiple cameras without harming comprehensive coverage of the scene. CrossRoI operates in two phases - an offline phase to establish cross-camera correlations, and an efficient online phase for real time video inference. Experiments on real-world video feeds show that CrossRoI achieves 42% ~ 65% reduction for network overhead and 25% ~ 34% reduction for response delay in real time video analytics applications with more than 99% query accuracy, when compared to baseline methods. If integrated with SotA frame filtering systems, the performance gains of CrossRoI reaches 50% ~ 80% (network overhead) and 33% ~ 61% (end-to-end delay).","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122671301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roberto Ramos-Chavez, R. Mekuria, Theodoros Karagkioules, Dirk Griffioen, Arjen Wagenaar, Mark Ogle
{"title":"MPEG NBMP testbed for evaluation of real-time distributed media processing workflows at scale","authors":"Roberto Ramos-Chavez, R. Mekuria, Theodoros Karagkioules, Dirk Griffioen, Arjen Wagenaar, Mark Ogle","doi":"10.1145/3458305.3463380","DOIUrl":"https://doi.org/10.1145/3458305.3463380","url":null,"abstract":"Real-time Distributed Media Processing Workflows (DMPW) are popular for online media delivery. Combining distributed media sources and processing can reduce storage costs and increase flexibility. However, high request rates may result in unacceptable latency or even failures in incorrect configurations. Thus, testing DMPW deployments at scale is key, particularly for real-time cases. We propose the new MPEG Network Based Media Processing (NBMP) standard for this and present a testbed implementation that includes all the reference components. In addition, the testbed includes a set of configurable functions for load generation, monitoring, data-collection and visualization. The testbed is used to test Dynamic Adaptive HTTP streaming functions under different workloads in a standardized and reproducible manner. A total of 327 tests with different loads and Real-Time DMPW configurations were completed. The results provide insights in the performance, reliability and time-consistency of each configuration. Based on these tests, we selected the preferred cloud instance type, considering hypervisor options and different function implementation configurations. Further, we analyzed different processing tasks and options for distributed deployments on edge and centralized clouds. Last, a classifier was developed to detect if failures happen under a certain workload. Results also show that, normalized inter-experiment standard deviation of the metric means can be an indicator for unstable or incorrect configurations.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"221 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127153617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manasvini Sethuraman, Anirudh Sarma, Ashutosh Dhekne, U. Ramachandran
{"title":"Foresight","authors":"Manasvini Sethuraman, Anirudh Sarma, Ashutosh Dhekne, U. Ramachandran","doi":"10.1629/12309","DOIUrl":"https://doi.org/10.1629/12309","url":null,"abstract":"The signs of this momentum are everywhere: in new sources of capital and new funding approaches that promise to unlock trillions of dollars in new equity and debt investment; in growing asset management capabilities, cyber security and public procurement, which are ushering in a real step-change in the way operators and owners manage assets; in the growing boldness of governments seeking to catalyze economic and social benefits; and in the growing alignment between the ‘macro’ needs of governments and the ‘micro’ decisions of consumers. Trends that will change the world of infrastructure over the next 5 years 10 emerging trends in 2016","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116969425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}