Proceedings of the 14th Conference on ACM Multimedia Systems最新文献_第6页

MOSAIC: Spatially-Multiplexed Edge AI Optimization over Multiple Concurrent Video Sensing Streams MOSAIC:在多个并发视频传感流上的空间多路边缘AI优化

Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-05-05 DOI: 10.1145/3587819.3590986

Ila Gokarn, H. Sabbella, Yigong Hu, T. Abdelzaher, Archan Misra

{"title":"MOSAIC: Spatially-Multiplexed Edge AI Optimization over Multiple Concurrent Video Sensing Streams","authors":"Ila Gokarn, H. Sabbella, Yigong Hu, T. Abdelzaher, Archan Misra","doi":"10.1145/3587819.3590986","DOIUrl":"https://doi.org/10.1145/3587819.3590986","url":null,"abstract":"Sustaining high fidelity and high throughput of perception tasks over vision sensor streams on edge devices remains a formidable challenge, especially given the continuing increase in image sizes (e.g., generated by 4K cameras) and complexity of DNN models. One promising approach involves criticality-aware processing, where the computation is directed selectively to \"critical\" portions of individual image frames. We introduce MOSAIC, a novel system for such criticality-aware concurrent processing of multiple vision sensing streams that provides a multiplicative increase in the achievable throughput with negligible loss in perception fidelity. MOSAIC determines critical regions from images received from multiple vision sensors and spatially bin-packs these regions using a novel multi-scale Mosaic Across Scales (MoS) tiling strategy into a single `canvas frame', sized such that the edge device can retain sufficiently high processing throughput. Experimental studies using benchmark datasets for two tasks, Automatic License Plate Recognition and Drone-based Pedestrian Detection, shows that MOSAIC, executing on a Jetson TX2 edge device, can provide dramatic gains in the throughput vs. fidelity tradeoff. For instance, for drone-based pedestrian detection, for a batch size of 4, MOSAIC can pack input frames from 6 cameras to achieve (a) 4.75X (475%) higher throughput (23 FPS per camera, cumulatively 138FPS) with ≤ 1% accuracy loss, compared to a First Come First Serve (FCFS) processing paradigm.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134045996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Latency Target based Analysis of the DASH.js Player 基于延迟目标的DASH.js播放器分析

Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-04-26 DOI: 10.1145/3587819.3590971

P. O'Hanlon, Adil Aslam

引用次数: 0

Video-based Contrastive Learning on Decision Trees: from Action Recognition to Autism Diagnosis 基于视频的决策树对比学习:从动作识别到自闭症诊断

Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-04-20 DOI: 10.1145/3587819.3590988

Mindi Ruan, Xiang Yu, Naifeng Zhang, Chuanbo Hu, Shuo Wang, Xin Li

{"title":"Video-based Contrastive Learning on Decision Trees: from Action Recognition to Autism Diagnosis","authors":"Mindi Ruan, Xiang Yu, Naifeng Zhang, Chuanbo Hu, Shuo Wang, Xin Li","doi":"10.1145/3587819.3590988","DOIUrl":"https://doi.org/10.1145/3587819.3590988","url":null,"abstract":"How can we teach a computer to recognize 10,000 different actions? Deep learning has evolved from supervised and unsupervised to self-supervised approaches. In this paper, we present a new contrastive learning-based framework for decision tree-based classification of actions, including human-human interactions (HHI) and human-object interactions (HOI). The key idea is to translate the original multi-class action recognition into a series of binary classification tasks on a pre-constructed decision tree. Under the new framework of contrastive learning, we present the design of an interaction adjacent matrix (IAM) with skeleton graphs as the backbone for modeling various action-related attributes such as periodicity and symmetry. Through the construction of various pretext tasks, we obtain a series of binary classification nodes on the decision tree that can be combined to support higher-level recognition tasks. Experimental justification for the potential of our approach in real-world applications ranges from interaction recognition to symmetry detection. In particular, we have demonstrated the promising performance of video-based autism spectrum disorder (ASD) diagnosis on the CalTech interview video database.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123833531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

FSVVD: A Dataset of Full Scene Volumetric Video FSVVD:全场景体积视频数据集

Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-03-07 DOI: 10.1145/3587819.3592551

Kaiyuan Hu, Yili Jin, Hao Yang, Junhua Liu, Fang Wang

引用次数: 4

An Asynchronous Intensity Representation for Framed and Event Video Sources 帧视频源和事件视频源的异步强度表示

Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-01-20 DOI: 10.1145/3587819.3590969

Andrew C. Freeman, Montek Singh, Ketan Mayer-Patel

{"title":"An Asynchronous Intensity Representation for Framed and Event Video Sources","authors":"Andrew C. Freeman, Montek Singh, Ketan Mayer-Patel","doi":"10.1145/3587819.3590969","DOIUrl":"https://doi.org/10.1145/3587819.3590969","url":null,"abstract":"Neuromorphic \"event\" cameras, designed to mimic the human vision system with asynchronous sensing, unlock a new realm of high-speed and high-dynamic-range applications. However, researchers often either revert to a framed representation of event data for applications, or build bespoke applications for a particular camera's event data type. To usher in the next era of video systems, accommodate new event camera designs, and explore the benefits of asynchronous video in classical applications, we argue that there is a need for an asynchronous, source-agnostic video representation. In this paper, we introduce a novel, asynchronous intensity representation for both framed and non-framed data sources. We show that our representation can increase intensity precision and greatly reduce the number of samples per pixel compared to grid-based representations. With framed sources, we demonstrate that by permitting a small amount of loss through the temporal averaging of stable pixel values, we can reduce our representational sample rate by more than half, while incurring a drop in VMAF quality score of only 4.5. We also demonstrate lower latency than the state-of-the-art method for fusing and transcoding framed and event camera data to an intensity representation, while maintaining 2000X the temporal resolution. We argue that our method provides the computational efficiency and temporal granularity necessary to build real-time intensity-based applications for event video.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"17 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125615601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Extending 3-DoF Metrics to Model User Behaviour Similarity in 6-DoF Immersive Applications 扩展 3-DoF 指标，为 6-DoF 沉浸式应用中的用户行为相似性建模

Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2021-12-17 DOI: 10.1145/3587819.3590976

Silvia Rossi, Irene Viola, Laura Toni, Pablo César

{"title":"Extending 3-DoF Metrics to Model User Behaviour Similarity in 6-DoF Immersive Applications","authors":"Silvia Rossi, Irene Viola, Laura Toni, Pablo César","doi":"10.1145/3587819.3590976","DOIUrl":"https://doi.org/10.1145/3587819.3590976","url":null,"abstract":"Immersive reality technologies, such as Virtual and Augmented Reality, have ushered a new era of user-centric systems, in which every aspect of the coding-delivery-rendering chain is tailored to the interaction of the users. Understanding the actual interactivity and behaviour of the users is still an open challenge and a key step to enabling such a user-centric system. Our main goal is to extend the applicability of existing behavioural methodologies for studying user navigation in the case of 6 Degree-of-Freedom (DoF). Specifically, we first compare the navigation in 6-DoF with its 3-DoF counterpart highlighting the main differences and novelties. Then, we define new metrics aimed at better modelling behavioural similarities between users in a 6-DoF system. We validate and test our solutions on real navigation paths of users interacting with dynamic volumetric media in 6-DoF Virtual Reality conditions. Our results show that metrics that consider both user position and viewing direction better perform in detecting user similarity while navigating in a 6-DoF system. Having easy-to-use but robust metrics that underpin multiple tools and answer the question \"how do we detect if two users look at the same content?\" open the gate to new solutions for a user-centric system.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131067787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Proceedings of the 14th Conference on ACM Multimedia Systems 第14届ACM多媒体系统会议论文集

Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 1900-01-01 DOI: 10.1145/3587819

引用次数: 0