L. Jing, Bo Liu, Jaeyoung Choi, Adam L. Janin, Julia Bernd, Michael W. Mahoney, G. Friedland
{"title":"A Discriminative and Compact Audio Representation for Event Detection","authors":"L. Jing, Bo Liu, Jaeyoung Choi, Adam L. Janin, Julia Bernd, Michael W. Mahoney, G. Friedland","doi":"10.1145/2964284.2970377","DOIUrl":"https://doi.org/10.1145/2964284.2970377","url":null,"abstract":"This paper presents a novel two-phase method for audio representation: Discriminative and Compact Audio Representation (DCAR). In the first phase, each audio track is modeled using a Gaussian mixture model (GMM) that includes several components to capture the variability within that track. The second phase takes into account both global structure and local structure. In this phase, the components are rendered more discriminative and compact by formulating an optimization problem on Grassmannian manifolds, which we found represents the structure of audio effectively. Experimental results on the YLI-MED dataset show that the proposed DCAR representation consistently outperforms state-of-the-art audio representations: i-vector, mv-vector, and GMM.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124483007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yong Xue Eu, Jermyn Tanu, Justin Jieting Law, Muhammad Hanif B Ghazali, Shuan Siang Tay, Wei Tsang Ooi, A. Bhojan
{"title":"SuperStreamer","authors":"Yong Xue Eu, Jermyn Tanu, Justin Jieting Law, Muhammad Hanif B Ghazali, Shuan Siang Tay, Wei Tsang Ooi, A. Bhojan","doi":"10.1145/2964284.2973827","DOIUrl":"https://doi.org/10.1145/2964284.2973827","url":null,"abstract":"This technical demonstration presents the SuperStreamer project, which enables progressive game assets streaming to players while games are played, reducing the startup time required to download and start playing a cloud-based game. SuperStreamer modifies a popular game engine, Unreal Engine 4, to support developing and playing games with progressive game asset streaming. With SuperStreamer, developers can mark the minimal set of files, containing only the game content essential to start playing the game. When a player plays the game, these minimal set of files will be downloaded to the player's device. SuperStreamer also generates low resolution textures automatically when a developer publishes a game, and these low resolution textures are transmitted first into the game client. As players move through a game level, high quality textures required for the game will be downloaded. In our demo game, we are able to decrease the time taken to startup and load the first game level by around 30%.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"207 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121634277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-speed Depth Stream Generation from a Hybrid Camera","authors":"X. Zuo, Sen Wang, Jiangbin Zheng, Ruigang Yang","doi":"10.1145/2964284.2964305","DOIUrl":"https://doi.org/10.1145/2964284.2964305","url":null,"abstract":"High-speed video has been commonly adopted in consumer-grade cameras, augmenting these videos with a corresponding depth stream will enable new multimedia applications, such as 3D slow-motion video. In this paper, we present a hybrid camera system that combines a high-speed color camera with a depth sensor, e.g. Kinect depth sensor, to generate a depth stream that can produce both high-speed and high-resolution RGB+depth stream. Simply interpolating the low-speed depth frames is not satisfactory, where interpolation artifacts and lose in surface details are often visible. We have developed a novel framework that utilizes both shading constraints within each frame and optical flow constraints between neighboring frames. More specifically we present (a) an effective method to find the intrinsics images to allow more accurate normal estimation; and (b) an optimization-based framework to estimate the high-resolution/high-speed depth stream, taking into consideration temporal smoothness and shading/depth consistency. We evaluated our holistic framework with both synthetic and real sequences, it showed superior performance than previous state-of-the-art.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132051057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Micro Tells Macro: Predicting the Popularity of Micro-Videos via a Transductive Model","authors":"Jingyuan Chen, Xuemeng Song, Liqiang Nie, Xiang Wang, Hanwang Zhang, Tat-Seng Chua","doi":"10.1145/2964284.2964314","DOIUrl":"https://doi.org/10.1145/2964284.2964314","url":null,"abstract":"Micro-videos, a new form of user generated contents (UGCs), are gaining increasing enthusiasm. Popular micro-videos have enormous commercial potential in many ways, such as online marketing and brand tracking. In fact, the popularity prediction of traditional UGCs including tweets, web images, and long videos, has achieved good theoretical underpinnings and great practical success. However, little research has thus far been conducted to predict the popularity of the bite-sized videos. This task is non-trivial due to three reasons: 1) micro-videos are short in duration and of low quality; 2) they can be described by multiple heterogeneous channels, spanning from social, visual, acoustic to textual modalities; and 3) there are no available benchmark dataset and discriminant features that are suitable for this task. Towards this end, we present a transductive multi-modal learning model. The proposed model is designed to find the optimal latent common space, unifying and preserving information from different modalities, whereby micro-videos can be better represented. This latent space can be used to alleviate the information insufficiency problem caused by the brief nature of micro-videos. In addition, we built a benchmark dataset and extracted a rich set of popularity-oriented features to characterize the popular micro-videos. Extensive experiments have demonstrated the effectiveness of the proposed model. As a side contribution, we have released the dataset, codes and parameters to facilitate other researchers.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132247821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alireza Zare, A. Aminlou, M. Hannuksela, M. Gabbouj
{"title":"HEVC-compliant Tile-based Streaming of Panoramic Video for Virtual Reality Applications","authors":"Alireza Zare, A. Aminlou, M. Hannuksela, M. Gabbouj","doi":"10.1145/2964284.2967292","DOIUrl":"https://doi.org/10.1145/2964284.2967292","url":null,"abstract":"Delivering wide-angle and high-resolution spherical panoramic video content entails a high streaming bitrate. This imposes challenges when panorama clips are consumed in virtual reality (VR) head-mounted displays (HMD). The reason is that the HMDs typically require high spatial and temporal fidelity contents and strict low-latency in order to guarantee the user's sense of presence while using them. In order to alleviate the problem, we propose to store two versions of the same video content at different resolutions, each divided into multiple tiles using the High Efficiency Video Coding (HEVC) standard. According to the user's present viewport, a set of tiles is transmitted in the highest captured resolution, while the remaining parts are transmitted from the low-resolution version of the same content. In order to enable randomly choosing different combinations, the tile sets are encoded to be independently decodable. We further study the trade-off in the choice of tiling scheme and its impact on compression and streaming bitrate performances. The results indicate streaming bitrate saving from 30% to 40%, depending on the selected tiling scheme, when compared to streaming the entire video content.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131466079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"WorkCache: Salvaging siloed knowledge","authors":"S. Carter, Laurent Denoue, Matthew L. Cooper","doi":"10.1145/2964284.2973809","DOIUrl":"https://doi.org/10.1145/2964284.2973809","url":null,"abstract":"The proliferation of workplace multimedia collaboration applications has meant on one hand more opportunities for group work but on the other more data locked away in proprietary interfaces. We are developing new tools to capture and access multimedia content from any source. In this demo, we focus primarily on new methods that allow users to rapidly reconstitute, enhance, and share document-based information.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132607746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MP3DG-PCC, Open Source Software Framework for Implementation and Evaluation of Point Cloud Compression","authors":"R. Mekuria, Pablo César","doi":"10.1145/2964284.2973806","DOIUrl":"https://doi.org/10.1145/2964284.2973806","url":null,"abstract":"We present MP3DG-PCC, an open source framework for design, implementation and evaluation of point cloud compression algorithms. The framework includes objective quality metrics, lossy and lossless anchor codecs, and a test bench for consistent comparative evaluation. The framework and proposed methodology is in use for the development of an international point cloud compression standard in MPEG. In addition, the library is integrated with the popular point cloud library, making a large number of point cloud processing available and aligning the work with the broader open source community.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130859821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Key Color Generation for Affective Multimedia Production: An Initial Method and Its Application","authors":"Eunjin Kim, Hyeon‐Jeong Suk","doi":"10.1145/2964284.2964323","DOIUrl":"https://doi.org/10.1145/2964284.2964323","url":null,"abstract":"In this paper, we introduce a method that generates a key color to construct an aesthetic and affective harmony with visual content. Given an image and an affective term, our method creates a key color by combining a dominant hue of the image and a unique tone associated with the affective word. To match each affective term with a specific tone, we collected color themes from a crowd-sourced database and identified the most popular tone of color themes that are relevant to each affective term. The results of a user test showed that the method generates satisfactory key colors as much as designers do. Finally, as a prospective application, we employed our method to a promotional video editing prototype. Our method automatically generates a key color based on a frame of an input video and apply the color to a shape that delivers a promotional message. A second user study verifies that the video editing prototype with our method can effectively deliver the desired affective state with a satisfactory quality.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133513591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Correlation Features for Image Style Classification","authors":"W. Chu, Yi-Ling Wu","doi":"10.1145/2964284.2967251","DOIUrl":"https://doi.org/10.1145/2964284.2967251","url":null,"abstract":"This paper presents a comprehensive study of deep correlation features on image style classification. Inspired by that correlation between feature maps can effectively describe image texture, we design and transform various such correlations into style vectors, and investigate classification performance brought by different variants. In addition to intra-layer correlation, we also propose inter-layer correlation and verify its benefit. Through extensive experiments on image style classification and artist classification, we demonstrate that the proposed style vectors significantly outperforms CNN features coming from fully-connected layers, as well as outperforms the state-of-the-art deep representation.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133301763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marco A. Hudelist, Sabrina Kletz, Klaus Schöffmann
{"title":"A Multi-Video Browser for Endoscopic Videos on Tablets","authors":"Marco A. Hudelist, Sabrina Kletz, Klaus Schöffmann","doi":"10.1145/2964284.2973821","DOIUrl":"https://doi.org/10.1145/2964284.2973821","url":null,"abstract":"We present a browser for endoscopic videos that is designed to easily navigate and compare scenes on a tablet. It utilizes frame stripes of different levels of detail to quickly switch between fast and detailed navigation. Moreover, it uses saliency methods to determine which areas of a given keyframe contain the most information to further improve the visualization of the frame stripes. As scenes with much movement can be non-relevant out-of-patient scenes, the tool supports filtering for scenes of low, medium and high motion. The tool can be especially useful for patient debriefings as well as for educational purposes.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"9 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132691698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}