Proceedings of the 14th Conference on ACM Multimedia Systems最新文献

筛选
英文 中文
MOSAIC: Spatially-Multiplexed Edge AI Optimization over Multiple Concurrent Video Sensing Streams MOSAIC:在多个并发视频传感流上的空间多路边缘AI优化
Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-05-05 DOI: 10.1145/3587819.3590986
Ila Gokarn, H. Sabbella, Yigong Hu, T. Abdelzaher, Archan Misra
{"title":"MOSAIC: Spatially-Multiplexed Edge AI Optimization over Multiple Concurrent Video Sensing Streams","authors":"Ila Gokarn, H. Sabbella, Yigong Hu, T. Abdelzaher, Archan Misra","doi":"10.1145/3587819.3590986","DOIUrl":"https://doi.org/10.1145/3587819.3590986","url":null,"abstract":"Sustaining high fidelity and high throughput of perception tasks over vision sensor streams on edge devices remains a formidable challenge, especially given the continuing increase in image sizes (e.g., generated by 4K cameras) and complexity of DNN models. One promising approach involves criticality-aware processing, where the computation is directed selectively to \"critical\" portions of individual image frames. We introduce MOSAIC, a novel system for such criticality-aware concurrent processing of multiple vision sensing streams that provides a multiplicative increase in the achievable throughput with negligible loss in perception fidelity. MOSAIC determines critical regions from images received from multiple vision sensors and spatially bin-packs these regions using a novel multi-scale Mosaic Across Scales (MoS) tiling strategy into a single `canvas frame', sized such that the edge device can retain sufficiently high processing throughput. Experimental studies using benchmark datasets for two tasks, Automatic License Plate Recognition and Drone-based Pedestrian Detection, shows that MOSAIC, executing on a Jetson TX2 edge device, can provide dramatic gains in the throughput vs. fidelity tradeoff. For instance, for drone-based pedestrian detection, for a batch size of 4, MOSAIC can pack input frames from 6 cameras to achieve (a) 4.75X (475%) higher throughput (23 FPS per camera, cumulatively 138FPS) with ≤ 1% accuracy loss, compared to a First Come First Serve (FCFS) processing paradigm.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134045996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Latency Target based Analysis of the DASH.js Player 基于延迟目标的DASH.js播放器分析
Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-04-26 DOI: 10.1145/3587819.3590971
P. O'Hanlon, Adil Aslam
{"title":"Latency Target based Analysis of the DASH.js Player","authors":"P. O'Hanlon, Adil Aslam","doi":"10.1145/3587819.3590971","DOIUrl":"https://doi.org/10.1145/3587819.3590971","url":null,"abstract":"We analyse the low latency performance of the three Adaptive Bitrate (ABR) algorithms in the dash.js Dynamic Adaptive Streaming over HTTP (DASH) player with respect to a range of latency targets and configuration options. We perform experiments on our DASH Testbed which allows for testing with a range of real world derived network profiles. Our experiments enable a better understanding of how latency targets affect quality of experience (QoE), and how well the different algorithms adhere to their targets. We find that with dash.js v4.5.0 the default Dynamic algorithm achieves the best overall QoE. We show that whilst the other algorithms can achieve higher video quality at lower latencies, they do so only at the expense of increased stalling. We analyse the poor performance of L2A-LL in our tests and develop modifications which demonstrate significant improvements. We also highlight how some low latency configuration settings can be detrimental to performance.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128597710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Video-based Contrastive Learning on Decision Trees: from Action Recognition to Autism Diagnosis 基于视频的决策树对比学习:从动作识别到自闭症诊断
Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-04-20 DOI: 10.1145/3587819.3590988
Mindi Ruan, Xiang Yu, Naifeng Zhang, Chuanbo Hu, Shuo Wang, Xin Li
{"title":"Video-based Contrastive Learning on Decision Trees: from Action Recognition to Autism Diagnosis","authors":"Mindi Ruan, Xiang Yu, Naifeng Zhang, Chuanbo Hu, Shuo Wang, Xin Li","doi":"10.1145/3587819.3590988","DOIUrl":"https://doi.org/10.1145/3587819.3590988","url":null,"abstract":"How can we teach a computer to recognize 10,000 different actions? Deep learning has evolved from supervised and unsupervised to self-supervised approaches. In this paper, we present a new contrastive learning-based framework for decision tree-based classification of actions, including human-human interactions (HHI) and human-object interactions (HOI). The key idea is to translate the original multi-class action recognition into a series of binary classification tasks on a pre-constructed decision tree. Under the new framework of contrastive learning, we present the design of an interaction adjacent matrix (IAM) with skeleton graphs as the backbone for modeling various action-related attributes such as periodicity and symmetry. Through the construction of various pretext tasks, we obtain a series of binary classification nodes on the decision tree that can be combined to support higher-level recognition tasks. Experimental justification for the potential of our approach in real-world applications ranges from interaction recognition to symmetry detection. In particular, we have demonstrated the promising performance of video-based autism spectrum disorder (ASD) diagnosis on the CalTech interview video database.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123833531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
FSVVD: A Dataset of Full Scene Volumetric Video FSVVD:全场景体积视频数据集
Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-03-07 DOI: 10.1145/3587819.3592551
Kaiyuan Hu, Yili Jin, Hao Yang, Junhua Liu, Fang Wang
{"title":"FSVVD: A Dataset of Full Scene Volumetric Video","authors":"Kaiyuan Hu, Yili Jin, Hao Yang, Junhua Liu, Fang Wang","doi":"10.1145/3587819.3592551","DOIUrl":"https://doi.org/10.1145/3587819.3592551","url":null,"abstract":"Recent years have witnessed a rapid development of immersive multimedia which bridges the gap between the real world and virtual space. Volumetric videos, as an emerging representative 3D video paradigm that empowers extended reality, stand out to provide unprecedented immersive and interactive video watching experience. Despite the tremendous potential, the research towards 3D volumetric video is still in its infancy, relying on sufficient and complete datasets for further exploration. However, existing related volumetric video datasets mostly only include a single object, lacking details about the scene and the interaction between them. In this paper, we focus on the current most widely used data format, point cloud, and for the first time release a full-scene volumetric video dataset that includes multiple people and their daily activities interacting with the external environments. Comprehensive dataset description and analysis are conducted, with potential usage of this dataset. The dataset and additional tools can be accessed via the following website: https://cuhksz-inml.github.io/full_scene_volumetric_video_dataset/.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129440780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An Asynchronous Intensity Representation for Framed and Event Video Sources 帧视频源和事件视频源的异步强度表示
Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-01-20 DOI: 10.1145/3587819.3590969
Andrew C. Freeman, Montek Singh, Ketan Mayer-Patel
{"title":"An Asynchronous Intensity Representation for Framed and Event Video Sources","authors":"Andrew C. Freeman, Montek Singh, Ketan Mayer-Patel","doi":"10.1145/3587819.3590969","DOIUrl":"https://doi.org/10.1145/3587819.3590969","url":null,"abstract":"Neuromorphic \"event\" cameras, designed to mimic the human vision system with asynchronous sensing, unlock a new realm of high-speed and high-dynamic-range applications. However, researchers often either revert to a framed representation of event data for applications, or build bespoke applications for a particular camera's event data type. To usher in the next era of video systems, accommodate new event camera designs, and explore the benefits of asynchronous video in classical applications, we argue that there is a need for an asynchronous, source-agnostic video representation. In this paper, we introduce a novel, asynchronous intensity representation for both framed and non-framed data sources. We show that our representation can increase intensity precision and greatly reduce the number of samples per pixel compared to grid-based representations. With framed sources, we demonstrate that by permitting a small amount of loss through the temporal averaging of stable pixel values, we can reduce our representational sample rate by more than half, while incurring a drop in VMAF quality score of only 4.5. We also demonstrate lower latency than the state-of-the-art method for fusing and transcoding framed and event camera data to an intensity representation, while maintaining 2000X the temporal resolution. We argue that our method provides the computational efficiency and temporal granularity necessary to build real-time intensity-based applications for event video.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"17 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125615601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Extending 3-DoF Metrics to Model User Behaviour Similarity in 6-DoF Immersive Applications 扩展 3-DoF 指标,为 6-DoF 沉浸式应用中的用户行为相似性建模
Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2021-12-17 DOI: 10.1145/3587819.3590976
Silvia Rossi, Irene Viola, Laura Toni, Pablo César
{"title":"Extending 3-DoF Metrics to Model User Behaviour Similarity in 6-DoF Immersive Applications","authors":"Silvia Rossi, Irene Viola, Laura Toni, Pablo César","doi":"10.1145/3587819.3590976","DOIUrl":"https://doi.org/10.1145/3587819.3590976","url":null,"abstract":"Immersive reality technologies, such as Virtual and Augmented Reality, have ushered a new era of user-centric systems, in which every aspect of the coding-delivery-rendering chain is tailored to the interaction of the users. Understanding the actual interactivity and behaviour of the users is still an open challenge and a key step to enabling such a user-centric system. Our main goal is to extend the applicability of existing behavioural methodologies for studying user navigation in the case of 6 Degree-of-Freedom (DoF). Specifically, we first compare the navigation in 6-DoF with its 3-DoF counterpart highlighting the main differences and novelties. Then, we define new metrics aimed at better modelling behavioural similarities between users in a 6-DoF system. We validate and test our solutions on real navigation paths of users interacting with dynamic volumetric media in 6-DoF Virtual Reality conditions. Our results show that metrics that consider both user position and viewing direction better perform in detecting user similarity while navigating in a 6-DoF system. Having easy-to-use but robust metrics that underpin multiple tools and answer the question \"how do we detect if two users look at the same content?\" open the gate to new solutions for a user-centric system.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131067787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proceedings of the 14th Conference on ACM Multimedia Systems 第14届ACM多媒体系统会议论文集
Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 1900-01-01 DOI: 10.1145/3587819
{"title":"Proceedings of the 14th Conference on ACM Multimedia Systems","authors":"","doi":"10.1145/3587819","DOIUrl":"https://doi.org/10.1145/3587819","url":null,"abstract":"","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130666938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信