Proceedings of the 14th Conference on ACM Multimedia Systems最新文献

筛选
英文 中文
Machine-learning based VMAF prediction for HDR video content 基于机器学习的HDR视频内容VMAF预测
Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-06-07 DOI: 10.1145/3587819.3593941
Christoph Müller, Stephan Steglich, Sandra Groß, Paul Kremer
{"title":"Machine-learning based VMAF prediction for HDR video content","authors":"Christoph Müller, Stephan Steglich, Sandra Groß, Paul Kremer","doi":"10.1145/3587819.3593941","DOIUrl":"https://doi.org/10.1145/3587819.3593941","url":null,"abstract":"This paper presents a methodology for predicting VMAF video quality scores for high dynamic range (HDR) video content using machine learning. To train the ML model, we are collecting a dataset of HDR and converted SDR video clips, as well as their corresponding objective video quality scores, specifically the Video Multimethod Assessment Fusion (VMAF) values. A 3D convolutional neural network (3D-CNN) model is being trained on the collected dataset. Finally, a hands-on demonstrator is developed to showcase the newly predicted HDR-VMAF metric in comparison to VMAF and other metric values for SDR content, and to conduct further validation with user testing.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131951625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vegvisir: A testing framework for HTTP/3 media streaming 一个HTTP/3媒体流的测试框架
Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-06-07 DOI: 10.1145/3587819.3592550
Joris Herbots, M. Vandersanden, P. Quax, W. Lamotte
{"title":"Vegvisir: A testing framework for HTTP/3 media streaming","authors":"Joris Herbots, M. Vandersanden, P. Quax, W. Lamotte","doi":"10.1145/3587819.3592550","DOIUrl":"https://doi.org/10.1145/3587819.3592550","url":null,"abstract":"Assessing media streaming performance traditionally requires the presence of reproducible network conditions and a heterogeneous dataset of media materials. Setting up such experiments represents a complex challenge in itself. This challenge becomes even more complex when we consider the new QUIC transport protocol, which has many tunable features, yet is difficult to analyze due to its inherent encrypted nature. In this paper, we introduce Vegvisir, which aims to solve these aforementioned challenges by providing an open-source automated testing framework for orchestrating media streaming experiments over HTTP/3. We describe how users can steer the behavior of Vegvisir through its configuration system. We provide a high-level overview of its inner workings and its broad applicability by describing two use cases: one covering sizeable experiments spanning multiple days and another covering HAS evaluation scenarios.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"357 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115849869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
EVASR: Edge-Based Video Delivery with Salience-Aware Super-Resolution EVASR:基于边缘的视频传输与显著性感知超分辨率
Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-06-07 DOI: 10.1145/3587819.3590967
Na Li, Yao Liu
{"title":"EVASR: Edge-Based Video Delivery with Salience-Aware Super-Resolution","authors":"Na Li, Yao Liu","doi":"10.1145/3587819.3590967","DOIUrl":"https://doi.org/10.1145/3587819.3590967","url":null,"abstract":"With the rapid growth of video content consumption, it is important to deliver high-quality streaming videos to users even under limited available network bandwidth. In this paper, we propose EVASR, a system that performs edge-based video delivery to clients with salience-aware super-resolution. We select patches with higher saliency score to perform super-resolution while applying the simple yet efficient bicubic interpolation for the remaining patches in the same video frame. To efficiently use the computation resources available at the edge server, we introduce a new metric called \"saliency visual quality\" and formulate patch selection as an optimization problem to achieve the best performance when an edge server is serving multiple users. We implement EVASR based on the FFmpeg framework and conduct extensive experiments for evaluation. Results show that EVASR outperforms baseline approaches in both resource efficiency and visual quality metrics including PSNR, saliency visual quality (SVQ), and VMAF.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"17 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132602546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Dynamic 3D Point Cloud Dataset for Immersive Applications 沉浸式应用的动态3D点云数据集
Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-06-07 DOI: 10.1145/3587819.3592546
Yuan-Chun Sun, I-Chun Huang, Yuang Shi, Wei Tsang Ooi, Chun-Ying Huang, Cheng-Hsin Hsu
{"title":"A Dynamic 3D Point Cloud Dataset for Immersive Applications","authors":"Yuan-Chun Sun, I-Chun Huang, Yuang Shi, Wei Tsang Ooi, Chun-Ying Huang, Cheng-Hsin Hsu","doi":"10.1145/3587819.3592546","DOIUrl":"https://doi.org/10.1145/3587819.3592546","url":null,"abstract":"Motion estimation in a 3D point cloud sequence is a fundamental operation with many applications, including compression, error concealment, and temporal upscaling. While there have been multiple research contributions toward estimating the motion vector of points between frames, there is a lack of a dynamic 3D point cloud dataset with motion ground truth to benchmark against. In this paper, we present an open dynamic 3D point cloud dataset to fill this gap. Our dataset consists of synthetically generated objects with pre-determined motion patterns, allowing us to generate the motion vectors for the points. Our dataset contains nine objects in three categories (shape, avatar, and textile) with different animation patterns. We also provide semantic segmentation of each avatar object in the dataset. Our dataset can be used by researchers who need temporal information across frames. As an example, we present an evaluation of two motion estimation methods using our dataset.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130869212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SEPE Dataset: 8K Video Sequences and Images for Analysis and Development SEPE数据集:用于分析和开发的8K视频序列和图像
Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-06-07 DOI: 10.1145/3587819.3592560
Tariq Al Shoura, Ali Mollaahmadi Dehaghi, Reza Razavi, B. Far, Mohammad Moshirpour
{"title":"SEPE Dataset: 8K Video Sequences and Images for Analysis and Development","authors":"Tariq Al Shoura, Ali Mollaahmadi Dehaghi, Reza Razavi, B. Far, Mohammad Moshirpour","doi":"10.1145/3587819.3592560","DOIUrl":"https://doi.org/10.1145/3587819.3592560","url":null,"abstract":"This paper provides an overview of our open (Software Engineering Practice and Education) SEPE 8K dataset which is made of 40 different 8K (8192 x 4320) video sequences and 40 variant 8K (8192 x 5464) images. The video sequences were captured at a framerate of 29.97 frames per second (FPS) and had been encoded into videos using AVC/H.264, HEVC/H.265, and AV1 codecs at resolutions from 8K to 480p. The images, video sequences, encoded videos, and various other statistics related to the media that make the dataset are stored online, published, and maintained on the repo on GitHub for non-commercial use. In this paper, the dataset components are described and analyzed using various methods. The proposed dataset is - as far as we know - the first to publish true 8K natural sequences; thus, it is important for the next level of applications dealing with multimedia such as video quality assessment, super-resolution, video coding, video compression, and many more. GitHub: https://github.com/talshoura/SEPE-8K-Dataset","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131900878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
"You AR' right in front of me": RGBD-based capture and rendering for remote training “你的AR就在我面前”:基于rgbd的远程训练捕获和渲染
Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-06-07 DOI: 10.1145/3587819.3593936
S. Gunkel, S. Dijkstra-Soudarissanane, O. Niamut
{"title":"\"You AR' right in front of me\": RGBD-based capture and rendering for remote training","authors":"S. Gunkel, S. Dijkstra-Soudarissanane, O. Niamut","doi":"10.1145/3587819.3593936","DOIUrl":"https://doi.org/10.1145/3587819.3593936","url":null,"abstract":"Immersive technologies such as virtual reality have enabled novel forms of education and training, where students can learn new skills in simulated environments. But some specialized training procedures, e.g. ESA-certified soldering, still involve real-world physical processes with physical lab equipment. Such training sessions require students to travel to teaching labs and may interrupt everyday commitments for a longer period of time. There is a desire to make such training procedures more accessible remotely while keeping any student-to-teacher interaction natural, personal, and engaging. This paper presents a prototype for a remote teaching use case by rendering 3D photorealistic representations into the Augmented Reality (AR) glasses of a student. The teacher is captured with a modular RGBD capture application integrated into a web-based immersive communication platform. The integration offers multiple real-time capture calibration and rendering configurations. Our modular platform allows for an easy evaluation of different technical constraints as well as easy testing of the use case itself. Such evaluation may include a direct comparison of different 3D point-cloud and mesh rendering techniques. Additionally, the overall system allows immersive interaction between the student and the teacher, including augmented text messages for non-intrusive notifications. Our platform offers an ideal testbed for both technical and user-centered immersive communication studies.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131255841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Perceptual annotation of local distortions in videos: tools and datasets 视频中局部失真的感知标注:工具和数据集
Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-06-07 DOI: 10.1145/3587819.3592559
Andréas Pastor, P. Le Callet
{"title":"Perceptual annotation of local distortions in videos: tools and datasets","authors":"Andréas Pastor, P. Le Callet","doi":"10.1145/3587819.3592559","DOIUrl":"https://doi.org/10.1145/3587819.3592559","url":null,"abstract":"To assess the quality of multimedia content, create datasets, and train objective quality metrics, one needs to collect subjective opinions from annotators. Different subjective methodologies exist, from direct rating with single or double stimuli to indirect rating with pairwise comparisons. Triplet and quadruplet-based comparisons are a type of indirect rating. From these comparisons and preferences on stimuli, we can place the assessed stimuli on a perceptual scale (e.g., from low to high quality). Maximum Likelihood Difference Scaling (MLDS) solver is one of these algorithms working with triplets and quadruplets. A participant is asked to compare intervals inside pairs of stimuli: (a,b) and (c,d), where a,b,c,d are stimuli forming a quadruplet. However, one limitation is that the perceptual scales retrieved from stimuli of different contents are usually not comparable. We previously offered a solution to measure the inter-content scale of multiple contents. This paper presents an open-source python implementation of the method and demonstrates its use on three datasets collected in an in-lab environment. We compared the accuracy and effectiveness of the method using pairwise, triplet, and quadruplet for intra-content annotations. The code is available here: https://github.com/andreaspastor/MLDS_inter_content_scaling.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127396711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Video Decoding Performance and Requirements for XR Applications XR应用的视频解码性能和要求
Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-06-07 DOI: 10.1145/3587819.3593940
Emmanouil Potetsianakis, E. Thomas
{"title":"Video Decoding Performance and Requirements for XR Applications","authors":"Emmanouil Potetsianakis, E. Thomas","doi":"10.1145/3587819.3593940","DOIUrl":"https://doi.org/10.1145/3587819.3593940","url":null,"abstract":"Designing XR applications creates challenges regarding the performance and the scaling of media decoding operations, composition and synchronization of the various assets. Going beyond the single decoder paradigm of conventional video applications, XR applications tend to compose more and more visual streams such as 2D video assets but also textures and 2D/3D graphics encoded in video streams. All this demands a robust and predictable decoder management and a dynamic buffer organization. However, the behaviour of multiple decoder instances running in parallel is yet to be well understood on mobile platforms. To this end, we present in this paper VidBench - a parallel video decoding performance measurement tool for mobile Android devices. With VidBench, we quantify the challenges for applications using parallel video decoding pipelines with objective measurements and subjectively, we illustrate the current state of decoding multiple media streams and the possible visual artefacts resulting from unmanaged parallel video pipelines. Test results provide hints on the feasibility and the potential performance gain of using technologies like the MPEG-I Part 13 - Video Decoding Interface for immersive media (VDI) to alleviate those problems. We briefly present the main goals of VDI, standardised by the SC29 WG3 Moving Picture Experts Group (MPEG) Systems, which introduces functions and related constraints for optimizing such decoding instances as well as relevant video decoding APIs on which VDI is building upon such as the Khronos Vulkan Video extension.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114995302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Color-aware Deep Temporal Backdrop Duplex Matting System 颜色感知深时间背景双工抠图系统
Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-06-05 DOI: 10.1145/3587819.3590973
Hendrik Hachmann, B. Rosenhahn
{"title":"Color-aware Deep Temporal Backdrop Duplex Matting System","authors":"Hendrik Hachmann, B. Rosenhahn","doi":"10.1145/3587819.3590973","DOIUrl":"https://doi.org/10.1145/3587819.3590973","url":null,"abstract":"Deep learning-based alpha matting showed tremendous improvements in recent years, yet, feature film production studios still rely on classical chroma keying including costly post-production steps. This perceived discrepancy can be explained by some missing links necessary for production which are currently not adequately addressed in the alpha matting community, in particular foreground color estimation or color spill compensation. We propose a neural network-based temporal multi-backdrop production system that combines beneficial features from chroma keying and alpha matting. Given two consecutive frames with different background colors, our one-encoder-dual-decoder network predicts foreground colors and alpha values using a patch-based overlap-blend approach. The system is able to handle imprecise backdrops, dynamic cameras, and dynamic foregrounds and has no restrictions on foreground colors. We compare our method to state-of-the-art algorithms using benchmark datasets and a video sequence captured by a demonstrator setup. We verify that a dual backdrop input is superior to the usually applied trimap-based approach. In addition, the proposed studio set is actor friendly, and produces high-quality, temporal consistent alpha and color estimations that include a superior color spill compensation.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134593337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TotalDefMeme: A Multi-Attribute Meme dataset on Total Defence in Singapore TotalDefMeme:新加坡全面防御的多属性Meme数据集
Proceedings of the 14th Conference on ACM Multimedia Systems Pub Date : 2023-05-29 DOI: 10.1145/3587819.3592545
Nirmalendu Prakash, Ming Shan Hee, R. Lee
{"title":"TotalDefMeme: A Multi-Attribute Meme dataset on Total Defence in Singapore","authors":"Nirmalendu Prakash, Ming Shan Hee, R. Lee","doi":"10.1145/3587819.3592545","DOIUrl":"https://doi.org/10.1145/3587819.3592545","url":null,"abstract":"Total Defence is a defence policy combining and extending the concept of military defence and civil defence. While several countries have adopted total defence as their defence policy, very few studies have investigated its effectiveness. With the rapid proliferation of social media and digitalisation, many social studies have been focused on investigating policy effectiveness through specially curated surveys and questionnaires either through digital media or traditional forms. However, such references may not truly reflect the underlying sentiments about the target policies or initiatives of interest. People are more likely to express their sentiment using communication mediums such as starting topic thread on forums or sharing memes on social media. Using Singapore as a case reference, this study aims to address this research gap by proposing TotalDefMeme, a large-scale multi-modal and multi-attribute meme dataset that captures public sentiments toward Singapore's Total Defence policy. Besides supporting social informatics and public policy analysis of the Total Defence policy, TotalDefMeme can also support many downstream multi-modal machine learning tasks, such as aspect-based stance classification and multi-modal meme clustering. We perform baseline machine learning experiments on TotalDefMeme and evaluate its technical validity, and present possible future interdisciplinary research directions and application scenarios using the dataset as a baseline.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130344836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信