Philipp Habermann, C. C. Chi, M. Alvarez-Mesa, B. Juurlink
{"title":"Optimizing HEVC CABAC Decoding with a Context Model Cache and Application-Specific Prefetching","authors":"Philipp Habermann, C. C. Chi, M. Alvarez-Mesa, B. Juurlink","doi":"10.1109/ISM.2015.97","DOIUrl":"https://doi.org/10.1109/ISM.2015.97","url":null,"abstract":"Context-based Adaptive Binary Arithmetic Coding is the entropy coding module in the most recent JCT-VC video coding standard HEVC/H.265. As in the predecessor H.264/AVC, CABAC is a well-known throughput bottleneck due to its strong data dependencies. Beside other optimizations, the replacement of the context model memory by a smaller cache has been proposed, resulting in an improved clock frequency. However, the effect of potential cache misses has not been properly evaluated. Our work fills this gap and performs an extensive evaluation of different cache configurations. Furthermore, it is demonstrated that application-specific context model prefetching can effectively reduce the miss rate and make it negligible. Best overall performance results were achieved with caches of two and four lines, where each cache line consists of four context models. Four cache lines allow a speed-up of 10% to 12% for all video configurations while two cache lines improve the throughput by 9% to 15% for high bitrate videos and by 1% to 4% for low bitrate videos.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115115309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Songle Widget: Making Animation and Physical Devices Synchronized with Music Videos on the Web","authors":"Masataka Goto, Kazuyoshi Yoshii, Tomoyasu Nakano","doi":"10.1109/ISM.2015.64","DOIUrl":"https://doi.org/10.1109/ISM.2015.64","url":null,"abstract":"This paper describes a web-based multimedia development framework, Songle Widget, that makes it possible to control computer-graphic animation and physical devices such as lighting devices and robots in synchronization with music publicly available on the web. To avoid the difficulty of time-consuming manual annotation, Songle Widget makes it easy to develop web-based applications with rigid music synchronization by leveraging music-understanding technologies. Four types of musical elements (music structure, hierarchical beat structure, melody line, and chords) have been automatically annotated for more than 920,000 songs on music-or video-sharing services and can readily be used by music-synchronized applications. Since errors are inevitable when elements are annotated automatically, Songle Widget takes advantage of a user-friendly crowdsourcing interface that enables users to correct them. This is effective when applications require error-free annotation. We made Songle Widget open to the public, and its capabilities and usefulness have been demonstrated in seven music-synchronized applications.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"364 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132451133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Content-Based Multimedia Copy Detection","authors":"Chahid Ouali, P. Dumouchel, Vishwa Gupta","doi":"10.1109/ISM.2015.40","DOIUrl":"https://doi.org/10.1109/ISM.2015.40","url":null,"abstract":"In this paper, we address the problem of multimedia content-based copy detection. We propose several audio and video fingerprints that are highly robust to audio and video transformations. We propose to accelerate the search of fingerprints by using a Graphics Processing Unit (GPU). To speedup this search even further, we propose a two-step search based on a clustering technique and a lookup table that reduces the number of comparisons between the query and the reference fingerprints. We evaluate our fingerprints on the well-known TRECVID 2009 and 2010 datasets, and we show that the proposed fingerprints outperform other state-of-the-art audio and video fingerprints while being significantly faster.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133096932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Vclick: Endpoint Driven Enterprise WebRTC","authors":"Kundan Singh, J. Yoakum","doi":"10.1109/ISM.2015.92","DOIUrl":"https://doi.org/10.1109/ISM.2015.92","url":null,"abstract":"We present a robust, scalable and secure system architecture for web-based multimedia collaboration that keeps the application logic in the endpoint browser. Vclick is a simple and easy-to-use application for video interaction, collaboration and presence using HTML5 technologies including WebRTC (Web Real Time Communication), and is independent of legacy Voice-over-IP systems. Since its conception in early 2013, it has received many positive feedbacks, undergone improvements, and has been used in many enterprise communications research projects both in the cloud and on premise, on desktop as well as mobile. The techniques used and the challenges faced are useful to other emerging WebRTC applications.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116791116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Musical Similarity and Commonness Estimation Based on Probabilistic Generative Models","authors":"Tomoyasu Nakano, Kazuyoshi Yoshii, Masataka Goto","doi":"10.1142/S1793351X1640002X","DOIUrl":"https://doi.org/10.1142/S1793351X1640002X","url":null,"abstract":"This paper proposes a novel concept we call musical commonness, which is the similarity of a song to a set of songs, in other words, its typicality. This commonness can be used to retrieve representative songs from a song set (e.g., songs released in the 80s or 90s). Previous research on musical similarity has compared two songs but has not evaluated the similarity of a song to a set of songs. The methods presented here for estimating the similarity and commonness of polyphonic musical audio signals are based on a unified framework of probabilistic generative modeling of four musical elements (vocal timbre, musical timbre, rhythm, and chord progression). To estimate the commonness, we use a generative model trained from a song set instead of estimating musical similarities of all possible song-pairs by using a model trained from each song. In experimental evaluation, we used 3278 popular music songs. Estimated song-pair similarities are comparable to ratings by a musician at the 0.1% significance level for vocal and musical timbre, at the 1% level for rhythm, and the 5% level for chord progression. Results of commonness evaluation show that the higher the musical commonness is, the more similar a song is to songs of a song set.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124869402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Error Protection for the Streaming of Motion JPEG 2000 Video over Variable Bit Error Rate Channels","authors":"G. Baruffa, F. Frescura","doi":"10.1109/ISM.2015.30","DOIUrl":"https://doi.org/10.1109/ISM.2015.30","url":null,"abstract":"In this paper we present a technique that can be used for optimizing the streaming of Motion JPEG 2000 video when the communication channel can be abstracted as a binary symmetric channel (BSC), characterized by a certain slowly variable rate of transmission errors. The video is packetized and every packet is protected with a Reed-Solomon forward error correction (FEC) code and then sent on the communication channel. The FEC helps to recover from erased/error-affected source information bytes at the receiver-decoder. The optimized amount of error protection is chosen by using an uncomplicated mathematical expression, given the knowledge of the channel status. In particular, we show the results of simulations that outline the robustness of this technique and its clear advantage over more trivial solutions.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127299505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"WTA Hash-Based Multimodal Feature Fusion for 3D Human Action Recognition","authors":"Jun Ye, Kai Li, K. Hua","doi":"10.1109/ISM.2015.11","DOIUrl":"https://doi.org/10.1109/ISM.2015.11","url":null,"abstract":"With the prevalence of the commodity depth sensors (e.g. Kinect), multimodal data including RGB stream, depth stream and audio stream have been utilized in various applications such as video games, education and health. Nevertheless, it is still very challenging to effectively fuse the features from multimodal data. In this paper, we propose a WTA (Winner-Take-All) Hash-based feature fusion algorithm and investigate its application in 3D human action recognition. Specifically, the WTA Hashing is performed to encode features from different modalities into the ordinal space. By leveraging the ordinal measures rather than using the absolute value of the original features, such feature embedding can provide a form of resilience to the scale and numerical perturbations. We propose a frame-level feature fusion algorithm and develop a WTA Hash-embedded warping algorithm to measure the similarity between two sequences. Experiments performed on three public 3D human action datasets show that the proposed fusion algorithm has achieved state-of-the-art recognition results even with the nearest neighbor search.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129587437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SalAd: A Multimodal Approach for Contextual Video Advertising","authors":"C. Xiang, Tam V. Nguyen, M. Kankanhalli","doi":"10.1109/ISM.2015.75","DOIUrl":"https://doi.org/10.1109/ISM.2015.75","url":null,"abstract":"The explosive growth of multimedia data on Internet has created huge opportunities for online video advertising. In this paper, we propose a novel advertising technique called SalAd, which utilizes textual information, visual content and the webpage saliency, to automatically associate the most suitable companion ads with online videos. Unlike most existing approaches that only focus on selecting the most relevant ads, SalAd further considers the saliency of selected ads to reduce intentional ignorance. SalAd consists of three basic steps. Given an online video and a set of advertisements, we first roughly identify a set of relevant ads based on the textual information matching. We then carefully select a sub-set of candidates based on visual content matching. In this regard, our selected ads are contextually relevant to online video content in terms of both textual information and visual content. We finally select the most salient ad among the relevant ads as the most appropriate one. To demonstrate the effectiveness of our method, we have conducted a rigorous eye-tracking experiment on two ad-datasets. The experimental results show that our method enhances the user engagement with the ad content while maintaining users' quality of video viewing experience.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122519519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benjamin Wulff, Andrew Wilson, Beate Jost, M. Ketterl
{"title":"An Adopter Centric API and Visual Programming Interface for the Definition of Strategies for Automated Camera Tracking","authors":"Benjamin Wulff, Andrew Wilson, Beate Jost, M. Ketterl","doi":"10.1109/ISM.2015.106","DOIUrl":"https://doi.org/10.1109/ISM.2015.106","url":null,"abstract":"The LectureSight system provides a facility for controlling robotic cameras during live presentation recordings in a fully automated way. In this paper we present how the system accounts for the very heterogeneous demands in the domain of lecture capture at universities. An API for the JavaScript programming language gives the adopter the freedom to formulate their own camera steering strategy. A graphical programming environment, based on the Open Roberta project, further eases the development of the steering logic. The accomplishments of the LectureSight project should serve as an example on how integrated measure and control systems can be made highly customizable and give the adopter the power to fully exploit the range of possibilities a technology provides.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114981476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Virtualization of Red-Button Signaling in Hybrid TV","authors":"A. Mikityuk, Martin Platschek, O. Friedrich","doi":"10.1109/ISM.2015.109","DOIUrl":"https://doi.org/10.1109/ISM.2015.109","url":null,"abstract":"Hybrid Broadcast Broadband TV (HbbTV) is a European Hybrid TV standard that combines web-based technologies with traditional TV broadcast services. With the release of the HbbTV 2.0 specification in mid-February 2015, the HbbTV standard has begun to gain momentum worldwide, e.g. in North and South America, in Asia and in the UK. Indeed, the ATSC standard 3.0 in North America will be harmonized with the HbbTV 2.0 version. In Europe we have already faced the fact that the HbbTV device market has become very fragmented. This is due to versioning of the standard, various hardware capabilities of HbbTV devices or the lack of the HbbTV standard support on millions of devices. In this work we address the challenge of device fragmentation on the HbbTV market with a cloud-enabled HbbTV concept. This concept is based on the virtualization of HbbTV application signaling or the so-called Red-button signaling. The Red-button signaling is terminated, executed and handled within the Cloud in our approach. This work presents the architecture to enable the Cloud HbbTV approach and the implementation of this architecture. Finally, the architecture evaluation and corresponding challenges are presented.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129447292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}