{"title":"COSMOS on Steroids: a Cheap Detector for Cheapfakes","authors":"Tankut Akgul, T. Civelek, Deniz Ugur, A. Begen","doi":"10.1145/3458305.3479968","DOIUrl":"https://doi.org/10.1145/3458305.3479968","url":null,"abstract":"The growing prevalence of visual disinformation has become an important problem to solve nowadays. Cheapfake is a new term used for the altered media generated by non-AI techniques. In their recent COSMOS work, the authors developed a self-supervised training strategy that detected whether different captions for a given image were out-of-context, meaning that even though pointing to the same object(s) in the image, the captions implied different meanings. In this paper, we propose four methods to improve the detection accuracy of COSMOS. These methods range from differential sensing and fake-or-fact checking that detect contradicting or fake captions to object-caption matching and threshold adjustment that modify the baseline algorithm for improved accuracy.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130713744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"uvgVenctester: Open-Source Test Automation Framework for Comprehensive Video Encoder Benchmarking","authors":"Joose Sainio, Alexandre Mercat, Jarno Vanne","doi":"10.1145/3458305.3478445","DOIUrl":"https://doi.org/10.1145/3458305.3478445","url":null,"abstract":"The agile and efficient development of modern video encoders calls for automated testing methodologies. This paper presents the first-of-its-kind open-source test automation framework called uvgVenctester (github.com/ultravideo/uvgVenctester) that is designed for comprehensive performance and conformance testing of video encoders with the desired set of test video sequences. Our framework comes with built-in support for the popular AVC, HEVC, VVC, VP9, and AV1 video coding formats and the state-of-the-art HM, Kvazaar, x265, VTM, VVenC, SVT-VP9, and SVT-AV1 video encoders. Furthermore, there are no technical limitations of adopting other formats or encoders. The developers can evaluate the encoder of interest under the three primary usage scenarios: 1) conformance testing of the encoded bitstream; 2) rate-distortion-complexity comparison with the other encoders; and 3) systematic exploration of encoding parameters. The framework provides commonly used analysis tools to quantify encoding quality, speed, and bitrate with versatile set of absolute and comparative results such as Bjøntegaard Delta (BD)-Rate for PSNR, SSIM, and VMAF quality metrics. The supported output formats include CSV, graph, and comparison table. They ensure that the results are available in human and machine-readable formats. To the best of our knowledge, the proposed framework is currently the most comprehensive and modular open-source software toolset for video encoder benchmarking.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122786236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"In-Network Scalable Video Adaption Using Big Packet Protocol","authors":"S. Clayman, M. Sayıt","doi":"10.1145/3458305.3478440","DOIUrl":"https://doi.org/10.1145/3458305.3478440","url":null,"abstract":"The essence of this work is to show how SVC Scalable Video can be adaptated in the network in an effective way, when the Big Packet Protocol (BPP) is used. This demo shows the advantages of BPP, which is a recently proposed transport protocol devised for real-time applications. We will show that in-network adaption can be provided using this new protocol. We show how a network node can change the packets during their transmission, but still present a very usable video stream to the client. The preliminary results show that BPP is a good alternative transport for video transmission.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"3583 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127520838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Diego Madariaga, Lucas Torrealba, Javier Madariaga, Javier Bustos-Jiménez, B. Bustos
{"title":"PePa Ping Dataset: Comprehensive Contextualization of Periodic Passive Ping in Wireless Networks","authors":"Diego Madariaga, Lucas Torrealba, Javier Madariaga, Javier Bustos-Jiménez, B. Bustos","doi":"10.1145/3458305.3478456","DOIUrl":"https://doi.org/10.1145/3458305.3478456","url":null,"abstract":"Among all Internet Quality of Service (QoS) indicators, Round-trip time (RTT), jitter and packet loss have been thoroughly studied due to their great impact on the overall network's performance and the Quality of Experience (QoE) perceived by the users. Considering that, we managed to generate a real-world dataset with a comprehensive contextualization of these important quality indicators by passively monitoring the network in user-space. To generate this dataset, we first developed a novel Periodic Passive Ping (PePa Ping) methodology for Android devices. Contrary to other works, PePa Ping periodically obtains RTT, jitter, and number of lost packets of all TCP connections. This passive approach relies on the implementation of a local VPN server residing inside the client device to manage all Internet traffic and obtain QoS information of the connections established. The collected QoS indicators are provided directly by the Linux kernel, and therefore, they are exceptionally close to real QoS values experienced by users' devices. Additionally, the PePa Ping application continuously measured other indicators related to each individual network flow, the state of the device, and the state of the Internet connection (either WiFi or Mobile). With all the collected information, each network flow can be precisely linked to a set of environmental data that provides a comprehensive contextualization of each individual connection.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128161990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine Learning Based Video Coding Enhancements for HTTP Adaptive Streaming","authors":"Ekrem Çetinkaya","doi":"10.1145/3458305.3478468","DOIUrl":"https://doi.org/10.1145/3458305.3478468","url":null,"abstract":"Video traffic comprises the majority of today's Internet traffic, and HTTP Adaptive Streaming (HAS) is the preferred method to deliver video content over the Internet. Increasing demand for video and the improvements in the video display conditions over the years caused an increase in the video coding complexity. This increased complexity brought the need for more efficient video streaming and coding solutions. The latest standard video codecs can reduce the size of the videos by using more efficient tools with higher time-complexities. The plans for integrating machine learning into upcoming video codecs raised the interest in applied machine learning for video coding. In this doctoral study, we aim to propose applied machine learning methods to video coding, focusing on HTTP adaptive streaming. We present four primary research questions to target different challenges in video coding for HTTP adaptive streaming.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121230744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bo Wang, Yuan Zhang, Si-Ze Qian, Zipeng Pan, Yuhong Xie
{"title":"A Hybrid Receiver-side Congestion Control Scheme for Web Real-time Communication","authors":"Bo Wang, Yuan Zhang, Si-Ze Qian, Zipeng Pan, Yuhong Xie","doi":"10.1145/3458305.3479970","DOIUrl":"https://doi.org/10.1145/3458305.3479970","url":null,"abstract":"Web real-time communication (WebRTC) employs congestion control to ensure the quality of experience (QoE). Different from congestion control schemes for TCP, WebRTC keeps a low-level playback buffer that considers excessively delayed packets as losses, which makes the congestion control for WebRTC more challenging. Existing heuristic schemes estimate the network conditions based on hand-crafted rules that may be suboptimal, leading to under-utilization or over-utilization of link capacity in many cases. On the other hand, the existing learning-based schemes train a model that acts in a large action space, which is hard to converge to a stable status and has low performance over unpredictable network conditions. In this paper, we propose a hybrid receiver-side congestion control (HRCC) framework, which combines a heuristic congestion control scheme with an RL-Agent that periodically generates a gain coefficient to tune the bandwidth estimated by the heuristic scheme. Extensive simulation experiments demonstrate that the HRCC's RL-Agent effectively tunes the bandwidth estimate of the heuristic scheme. The hybrid scheme achieves higher bandwidth utilization than the fully heuristic scheme with similar queuing delay and packet loss and outperforms the fully RL-based scheme on overall performance.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123288633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Pham, M. Avelino, D. Silhavy, Troung-Sinh An, S. Arbanowski
{"title":"Standards-based Streaming Analytics and its Visualization","authors":"S. Pham, M. Avelino, D. Silhavy, Troung-Sinh An, S. Arbanowski","doi":"10.1145/3458305.3478438","DOIUrl":"https://doi.org/10.1145/3458305.3478438","url":null,"abstract":"As OTT (over-the-top) media streaming and underlying technologies have matured, streaming analytics has become more important, especially in a heterogeneous device ecosystem, where new devices or software updates can potentially cause streaming issues. In this paper we consider SAND (Server and Network Assisted DASH), CMCD (Common Media Client Data) and Streaming Quality of Experience Events, Properties and Metrics (CTA-2066) as standards to enable interoperable, standard-based streaming analytics for the predominant streaming formats MPEG-DASH and HLS. We focus on the visualization aspect of streaming metrics in UI (user interface) dashboards.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132669973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinjue Hu, Chen-chao Wang, Yuxuan Pan, Yunming Liu, Yumei Wang, Yu Liu, Lin Zhang, S. Shirmohammadi
{"title":"4DLFVD","authors":"Xinjue Hu, Chen-chao Wang, Yuxuan Pan, Yunming Liu, Yumei Wang, Yu Liu, Lin Zhang, S. Shirmohammadi","doi":"10.1145/3458305.3478450","DOIUrl":"https://doi.org/10.1145/3458305.3478450","url":null,"abstract":"We present a 4D Light Field (LF) video dataset, collected by a custom-made camera matrix, to be used for designing and testing algorithms and systems for LF video coding, processing, and streaming. Compared to existing LF datasets, ours provides LF videos, as opposed to only images, and at higher frame resolution, higher number of viewpoints, and/or higher framerate, offering the best visual quality LF video dataset. To achieve this, we built a 10 x 10 LF capture matrix composed of 100 cameras, each with a 1920 x 1056 resolution. We used this matrix to record videos in real and varying illumination and scene dynamics conditions. The dataset contains a total of nine groups of LF videos: eight groups collected with a fixed camera matrix position and orientation recording indoor potted plants, furniture, etc., and the last group collected by rotating around an outdoor environment with roadside vehicles, pedestrians, etc. Each group of LF videos consists of 100 video streams encoded with H.265/HEVC. Scene changes vary from static to slightly dynamic to highly dynamic, providing a good level of diversity. As an example, we present the results of a depth estimation method and show that our dataset can be used for applications such as objection detection, 3D modeling, and others.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128333598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Hicks, A. Stautland, Ole Bernt Fasmer, Wenche Førland, H. Hammer, P. Halvorsen, K. Mjeldheim, K. Oedegaard, B. Osnes, Vigdis Elin Giæver Syrstad, M. Riegler, P. Jakobsen
{"title":"HYPERAKTIV","authors":"S. Hicks, A. Stautland, Ole Bernt Fasmer, Wenche Førland, H. Hammer, P. Halvorsen, K. Mjeldheim, K. Oedegaard, B. Osnes, Vigdis Elin Giæver Syrstad, M. Riegler, P. Jakobsen","doi":"10.1145/3458305.3478454","DOIUrl":"https://doi.org/10.1145/3458305.3478454","url":null,"abstract":"Machine learning research within healthcare frequently lacks the public data needed to be fully reproducible and comparable. Datasets are often restricted due to privacy concerns and legal requirements that come with patient-related data. Consequentially, many algorithms and models get published on the same topic without a standard benchmark to measure against. Therefore, this paper presents HYPERAKTIV, a public dataset containing health, activity, and heart rate data from patients diagnosed with attention deficit hyperactivity disorder, better known as ADHD. The dataset consists of data collected from 51 patients with ADHD and 52 clinical controls. In addition to the activity and heart rate data, we also include a series of patient attributes such as their age, sex, and information about their mental state, as well as output data from a computerized neuropsychological test. Together with the presented dataset, we also provide baseline experiments using traditional machine learning algorithms to predict ADHD based on the included activity data. We hope that this dataset can be used as a starting point for computer scientists who want to contribute to the field of mental health, and as a common benchmark for future work in ADHD analysis.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124492549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CDN and SDN Support and Player Interaction for HTTP Adaptive Video Streaming","authors":"R. Farahani","doi":"10.1145/3458305.3478464","DOIUrl":"https://doi.org/10.1145/3458305.3478464","url":null,"abstract":"Video streaming has become one of the most prevailing, bandwidth-hungry, and latency-sensitive Internet applications. HTTP Adaptive Streaming (HAS) has become the dominant video delivery mechanism over the Internet. Lack of coordination among the clients and lack of awareness of the network in pure client-based adaptive video bitrate approaches have caused problems, such as sub-optimal data throughput from Content Delivery Network (CDN) or origin servers, high CDN costs, and non-satisfactory users' experience. Recent studies have shown that network-assisted HAS techniques by utilizing modern networking paradigms, e.g., Software Defined Networking (SDN), Network Function Virtualization(NFV), and edge computing can significantly improve HAS system performance. In this doctoral study, we leverage the aforementioned modern networking paradigms and design network-assistance for/by HAS clients to improve HAS systems performance and CDN/network utilization. We present four fundamental research questions to target different challenges in devising a network-assisted HAS system.","PeriodicalId":138399,"journal":{"name":"Proceedings of the 12th ACM Multimedia Systems Conference","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116718136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}