{"title":"Factors Influencing Video Quality of Experience in Ecologically Valid Experiments: Measurements and a Theoretical Mode","authors":"Kamil Koniuch","doi":"10.1145/3587819.3593027","DOIUrl":"https://doi.org/10.1145/3587819.3593027","url":null,"abstract":"Users' perception of multimedia quality and satisfaction with multimedia services are the subject of various studies in the field of Quality of Experience (QoE). In this respect, subjective studies of quality represent an important part of the multimedia optimization process. However, researchers who measure QoE have to face its multidimensional character and address the fact that quality perception is influenced by numerous factors. To address this issue, experiments measuring QoE often limit the scope of factors influencing subjective judgments by administering laboratory protocols. However, the generalizability of the results gathered with such protocols is limited. The proposed PhD dissertation aims to address this challenge. In order to increase the generalizability of QoE studies we started with an identification of factors influencing user multimedia experience in a natural context. We proposed a new theoretical model of video QoE based on both original research and a literature review. This new theoretical framework allowed us to propose new experimental designs introducing influencing factors one by one in an additive manner. Thanks to the model, we can also propose comparable experiments which could differ in ecological validity. The proposed theoretical framework can be adjusted to other multimedia in the future.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"422 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122512197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-Supervised Contrastive Learning for Robust Audio-Sheet Music Retrieval Systems","authors":"Luis Carvalho, Tobias Washüttl, G. Widmer","doi":"10.1145/3587819.3590968","DOIUrl":"https://doi.org/10.1145/3587819.3590968","url":null,"abstract":"Linking sheet music images to audio recordings remains a key problem for the development of efficient cross-modal music retrieval systems. One of the fundamental approaches toward this task is to learn a cross-modal embedding space via deep neural networks that is able to connect short snippets of audio and sheet music. However, the scarcity of annotated data from real musical content affects the capability of such methods to generalize to real retrieval scenarios. In this work, we investigate whether we can mitigate this limitation with self-supervised contrastive learning, by exposing a network to a large amount of real music data as a pre-training step, by contrasting randomly augmented views of snippets of both modalities, namely audio and sheet images. Through a number of experiments on synthetic and real piano data, we show that pretrained models are able to retrieve snippets with better precision in all scenarios and pre-training configurations. Encouraged by these results, we employ the snippet embeddings in the higher-level task of cross-modal piece identification and conduct more experiments on several retrieval configurations. In this task, we observe that the retrieval quality improves from 30% up to 100% when real music data is present. We then conclude by arguing for the potential of self-supervised contrastive learning for alleviating the annotated data scarcity in multi-modal music retrieval models. Code and trained models are accessible at https://github.com/luisfvc/ucasr.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122561685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The ADΔER Framework: Tools for Event Video Representations","authors":"Andrew C. Freeman","doi":"10.1145/3587819.3593028","DOIUrl":"https://doi.org/10.1145/3587819.3593028","url":null,"abstract":"The concept of \"video\" is synonymous with frame-sequence image representations. However, neuromorphic \"event\" cameras, which are rapidly gaining adoption for computer vision tasks, record frameless video. We believe that these different paradigms of video capture can each benefit from the lessons of the other. To usher in the next era of video systems and accommodate new event camera designs, we argue that we will need an asynchronous, source-agnostic processing pipeline. In this paper, we propose an end-to-end framework for frameless video, and we describe its modularity and amenability to compression and both existing and future applications.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115685817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"QoE- and Energy-aware Content Consumption For HTTP Adaptive Streaming","authors":"Daniele Lorenzi","doi":"10.1145/3587819.3593029","DOIUrl":"https://doi.org/10.1145/3587819.3593029","url":null,"abstract":"Video streaming services account for the majority of today's traffic on the Internet, and according to recent studies, this share is expected to continue growing. Given this broad utilization, research in video streaming is recently moving towards energy-aware approaches, which aim at reducing the energy consumption of the devices involved in the streaming process. On the other side, the perception of quality delivered to the user plays an important role, and the advent of HTTP Adaptive Streaming (HAS) changed the way quality is perceived. The focus is not any more exclusively on the Quality of Service (QoS) but rather oriented towards the Quality of Experience (QoE) of the user taking part in the streaming session. Therefore video streaming services need to develop Adaptive BitRate (ABR) techniques to deal with different network conditions on the client side or appropriate end-to-end strategies to provide high QoE to the users. The scope of this doctoral study is within the end-to-end environment with a focus on the end-users domain, referred to as the player environment, including video content consumption and interactivity. This thesis aims to investigate and develop different techniques to increase the delivered QoE to the users and minimize the energy consumption of the end devices in HAS context. We present four main research questions to target the related challenges in the domain of content consumption for HAS systems.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121422035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tiago Soares da Costa, M. T. Andrade, Paula Viana, Nuno Castro Silva
{"title":"A Dataset for User Visual Behaviour with Multi-View Video Content","authors":"Tiago Soares da Costa, M. T. Andrade, Paula Viana, Nuno Castro Silva","doi":"10.1145/3587819.3592556","DOIUrl":"https://doi.org/10.1145/3587819.3592556","url":null,"abstract":"Immersive video applications impose unpractical bandwidth requirements for best-effort networks. With Multi-View (MV) streaming, these can be minimized by resorting to view prediction techniques. SmoothMV is a multi-view system that uses a non-intrusive head tracking mechanism to detect the viewer's interest and select appropriate views. By coupling Neural Networks (NNs) to anticipate the viewer's interest, a reduction of view-switching latency is likely to be obtained. The objective of this paper is twofold: 1) Present a solution for acquisition of gaze data from users when viewing MV content; 2) Describe a dataset, collected with a large-scale testbed, capable of being used to train NNs to predict the user's viewing interest. Tracking data from head movements was obtained from 45 participants using an Intel Realsense F200 camera, with 7 video playlists, each being viewed a minimum of 17 times. This dataset is publicly available to the research community and constitutes an important contribution to reducing the current scarcity of such data. Tools to obtain saliency/heat maps and generate complementary plots are also provided as an open-source software package.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125075065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Everybody Compose: Deep Beats To Music","authors":"Conghao Shen, Violet Z. Yao, Yixin Liu","doi":"10.1145/3587819.3592542","DOIUrl":"https://doi.org/10.1145/3587819.3592542","url":null,"abstract":"This project presents a deep learning approach to generate monophonic melodies based on input beats, allowing even amateurs to create their own music compositions. Three effective methods - LSTM with Full Attention, LSTM with Local Attention, and Transformer with Relative Position Representation - are proposed for this novel task, providing great variation, harmony, and structure in the generated music. This project allows anyone to compose their own music by tapping their keyboards or \"recoloring\" beat sequences from existing works.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"154 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128165584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abdurahman Ali Mohammed, Catherine Fonder, D. Sakaguchi, Wallapak Tavanapong, S. Mallapragada, A. Idris
{"title":"IDCIA: Immunocytochemistry Dataset for Cellular Image Analysis","authors":"Abdurahman Ali Mohammed, Catherine Fonder, D. Sakaguchi, Wallapak Tavanapong, S. Mallapragada, A. Idris","doi":"10.1145/3587819.3592558","DOIUrl":"https://doi.org/10.1145/3587819.3592558","url":null,"abstract":"We present a new annotated microscopic cellular image dataset to improve the effectiveness of machine learning methods for cellular image analysis. Cell counting is an important step in cell analysis. Typically, domain experts manually count cells in a microscopic image. Automated cell counting can potentially eliminate this tedious, time-consuming process. However, a good, labeled dataset is required for training an accurate machine learning model. Our dataset includes microscopic images of cells, and for each image, the cell count and the location of individual cells. The data were collected as part of an ongoing study investigating the potential of electrical stimulation to modulate stem cell differentiation and possible applications for neural repair. Compared to existing publicly available datasets, our dataset has more images of cells stained with more variety of antibodies (protein components of immune responses against invaders) typically used for cell analysis. The experimental results on this dataset indicate that none of the five existing models under this study are able to achieve sufficiently accurate count to replace the manual methods. The dataset is available at https://figshare.com/articles/dataset/Dataset/21970604.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132210735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VOLVQAD: An MPEG V-PCC Volumetric Video Quality Assessment Dataset","authors":"Samuel Rhys Cox, May Lim, Wei Tsang Ooi","doi":"10.1145/3587819.3592543","DOIUrl":"https://doi.org/10.1145/3587819.3592543","url":null,"abstract":"We present VOLVQAD, a volumetric video quality assessment dataset consisting 7,680 ratings on 376 video sequences from 120 participants. The volumetric video sequences are first encoded with MPEG V-PCC using 4 different avatar models and 16 quality variations, and then rendered into test videos for quality assessment using 2 different background colors and 16 different quality switching patterns. The dataset is useful for researchers who wish to understand the impact of volumetric video compression on subjective quality. Analysis of the collected data are also presented in this paper.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121210017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Victory Opeolu, H. Engelbrecht, Shun-Yun Hu, C. Marais
{"title":"VAST: A Decentralized Open-Source Publish/Subscribe Architecture","authors":"Victory Opeolu, H. Engelbrecht, Shun-Yun Hu, C. Marais","doi":"10.1145/3587819.3592554","DOIUrl":"https://doi.org/10.1145/3587819.3592554","url":null,"abstract":"Publish/Subscribe (pub/sub) systems have been widely adopted in highly scalable environments. We see this especially with IoT/IIoT applications, an environment where low bandwidth and high latency is ideal. The projected growth of Iot/IIoT network nodes are in the billions in the next few years and as such, there is a need for network communication standards that can adapt to the evergrowing nature of this industry. While current pub/sub standards have produced positive results so far, they all adopt a \"topic\" based pub/sub approach. They do not leverage off modern devices having spatial information. Current open-source standards also focus heavily on centralized brokering of information. This makes the broker in this system a potential bottleneck as it means if that broker goes down, the entire network goes down. We have developed a new, unique and innovative open-source pub/sub standard called VAST that leverages spatial information of modern network devices to perform message communication. It uses a unique concept called Spatial Publish/Subscribe (SPS). It is built on a peer-to-peer network to enable high scalability. In addition to this, it provides a Voronoi Overlay to efficiently distribute the messages, ensuring that network brokers are not overloaded with requests and ensures the network self-organizes itself if one or more brokers break down. It also has a forwarding algorithm to eliminate redundancies in the network. We will demonstrate this concept with a simulator we developed. We will show how the simulator works and how to use it. We believe that with this simulator, we will help encourage researchers adopt this technology for their spatial applications. An example of such is Massively Multi-user Virtual Environments (MMVEs), where there is a need for a high number of spatial network nodes in virtual environments.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129099023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FleXR: A System Enabling Flexibly Distributed Extended Reality","authors":"Jin Heo, Ketan Bhardwaj, Ada Gavrilovska","doi":"10.1145/3587819.3590966","DOIUrl":"https://doi.org/10.1145/3587819.3590966","url":null,"abstract":"Extended reality (XR) applications require computationally demanding functionalities with low end-to-end latency and high throughput. To enable XR on commodity devices, a number of distributed systems solutions enable offloading of XR workloads on remote servers. However, they make a priori decisions regarding the offloaded functionalities based on assumptions about operating factors, and their benefits are restricted to specific deployment contexts. To realize the benefits of offloading in various distributed environments, we present a distributed stream processing system, FleXR, which is specialized for real-time and interactive workloads and enables flexible distributions of XR functionalities. In building FleXR, we identified and resolved several issues of presenting XR functionalities as distributed pipelines. FleXR provides a framework for flexible distribution of XR pipelines while streamlining development and deployment phases. We evaluate FleXR with three XR use cases in four different distribution scenarios. In the results, the best-case distribution scenario shows up to 50% less end-to-end latency and 3.9x pipeline throughput compared to alternatives.","PeriodicalId":330983,"journal":{"name":"Proceedings of the 14th Conference on ACM Multimedia Systems","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127311708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}