{"title":"On residual quad-tree coding in HEVC","authors":"Y. H. Tan, Chuohao Yeo, Hui Li Tan, Zhengguo Li","doi":"10.1109/MMSP.2011.6093805","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093805","url":null,"abstract":"In the current working draft of HEVC, residual quad-tree (RQT) coding is used to encode prediction residuals in both Intra and Inter coding units (CU). However, the rationale for using RQT as a coding tool is different in the two cases. For Intra prediction units, RQT provides an efficient syntax for coding a number of sub-blocks with the same intra prediction mode. For Inter CUs, RQT adapts to the spatial-frequency variations of the CU, using as large a transform size as possible while catering to local variations in residual statistics. While providing coding gains, effective use of RQT currently requires an exhaustive search of all possible combinations of transform sizes within a block. In this paper, we exploit our insights to develop two fast RQT algorithms, each designed to meet the needs of Intra and Inter prediction residual coding.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125245836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Selective pixel interpolation for spatial error concealment","authors":"Yi Ge, Bo Yan, Kairan Sun, H. Gharavi","doi":"10.1109/MMSP.2011.6093828","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093828","url":null,"abstract":"This paper proposes an effective algorithm for spatial error concealment with accurate edge detection and partitioning interpolation. Firstly, a new method is used for detecting possible edge pixels and their matching pixels around the lost block. Then, the true edge lines can be determined, with which the lost block is partitioned. Finally, based on the partition result, each lost pixel can be interpolated with correct reference pixels, which are in the same region with the lost pixel. Experimental results show that the proposed spatial error concealment method is obviously superior to the previous methods for different sequences by up to 4.04 dB.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126607403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recognizing actions using salient features","authors":"Liang Wang, Debin Zhao","doi":"10.1109/MMSP.2011.6093832","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093832","url":null,"abstract":"Towards a compact video feature representation, we propose a novel feature selection methodology for action recognition based on the saliency maps of videos. Since saliency maps measure the perceptual importance of the pixels and regions in videos, selecting features using saliency maps enables us to find a feature representation that covers the informative parts of a video. Because saliency detection is a bottom-up procedure, some appearance changes or motions that are irrelevant to actions may also be detected as salient regions. To further improve the purity of the feature representation, we prune these irrelevant salient regions using the saliency values distribution and the spatial-temporal distribution of the salient regions. Extensive experiments are conducted to demonstrate that the proposed feature selection method largely improves the performance of bag-of-video-words model on action recognition based on three different attention models including a static attention model, a motion attention model and their combination.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"65 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114529890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A system for dynamic playlist generation driven by multimodal control signals and descriptors","authors":"Luca Chiarandini, M. Zanoni, A. Sarti","doi":"10.1109/MMSP.2011.6093850","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093850","url":null,"abstract":"This work describes a general approach to multimedia playlist generation and description and an application of the approach to music information retrieval. The example of system that we implemented updates a musical playlist on the fly based on prior information (musical preferences); current descriptors of the song that is being played; and fine-grained and semantically rich descriptors (descriptors of user's gestures, of environment conditions, etc.). The system incorporates a learning system that infers the user's preferences. Subjective tests have been conducted on usability and quality of the recommendation system.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114849512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Age estimation based on extended non-negative matrix factorization","authors":"Ce Zhan, W. Li, P. Ogunbona","doi":"10.1109/MMSP.2011.6093779","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093779","url":null,"abstract":"Previous studies suggested that local appearance-based methods are more efficient than geometric-based and holistic methods for age estimation. This is mainly due to the fact that age information are usually encoded by the local features such as wrinkles and skin texture on the forehead or at the eye corners. However, the variations of theses features caused by other factors such as identity, expression, pose and lighting may be larger than that caused by aging. Thus, one of the key challenges of age estimation lies in constructing a feature space that could successfully recovers age information while ignoring other sources of variations. In this paper, non-negative matrix factorization (NMF) is extended to learn a localized non-overlapping subspace representation for age estimation. To emphasize the appearance variation in aging, one individual extended NMF subspace is learned for each age or age group. The age or age group of a given face image is then estimated based on its reconstruction error after being projected into the learned age subspaces. Furthermore, a coarse to fine scheme is employed for exact age estimation, so that the age is estimated within the pre-classified age groups. Cross-database tests are conducted using FG-NET and MORPH databases to evaluate the proposed method. Experimental results have demonstrated the efficacy of the method.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121638809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Compression of compound images by combining several strategies","authors":"Cuiling Lan, Jizheng Xu, Feng Wu","doi":"10.1109/MMSP.2011.6093824","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093824","url":null,"abstract":"Compound images are combinations of text, graphics and natural images. They possess characteristics different from those of natural images, such as a strong anisotropy, sparse color histograms and repeated patterns. Former research on compressing them has mainly focused on developing certain strategies based on some of these characteristics but has failed so far to fully exploit them simultaneously. In this paper, we investigate the combination of four up-to-date strategies to construct a comprehensive scheme for compound image compression. We have implemented these strategies as four types of modes with variable block sizes. Experimental results show that the proposed scheme achieves significant coding gains for compound image compression at all bitrates.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125156449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A flexible markerless registration method for video augmented reality","authors":"L. Ling, I. Burnett, E. Cheng","doi":"10.1109/MMSP.2011.6093790","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093790","url":null,"abstract":"This paper proposes a flexible, markerless registration method that addresses the problem of realistic virtual object placement at any position in a video sequence. The registration consists of two steps: four points are specified by the user to build the world coordinate system, where the virtual object is rendered. A self-calibration camera tracking algorithm is then proposed to recover the camera viewpoint frame-by-frame, such that the virtual object can be dynamically and correctly rendered according to camera movement. The proposed registration method needs no reference fiducials, knowledge of camera parameters or the user environment, where the virtual object can be placed in any environment even without any distinct features. Experimental evaluations demonstrate low errors for several camera motion rotations around the X and Y axes for the self-calibration algorithm. Finally, virtual object rendering applications in different user environments are evaluated.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134114894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-dimensional correlation steganalysis","authors":"F. Farhat, A. Diyanat, S. Ghaemmaghami, M. Aref","doi":"10.1109/MMSP.2011.6093791","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093791","url":null,"abstract":"Multi-dimensional spatial analysis of image pixels have not been much investigated for the steganalysis of the LSB Steganographic methods. Pixel distribution based steganalysis methods could be thwarted by intelligently compensating statistical characteristics of image pixels, as reported in several papers. Simple LSB replacement methods have been improved by introducing smarter LSB embedding approaches, e.g. LSB matching and LSB+ methods, but they are basically the same in the sense of the LSB alteration. A new analytical method to detect LSB stego images is proposed in this paper. Our approach is based on the relative locations of image pixels that are essentially changed in an LSB embedding system. Furthermore, we introduce some new statistical features including “local entropies sum” and “clouds min sum” to achieve a higher performance. Simulation results show that our proposed approach outperforms some well-known LSB steganalysis methods, in terms of detection accuracy and the embedding rate estimation.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134126047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ObjectBook construction for large-scale semantic-aware image retrieval","authors":"Shiliang Zhang, Q. Tian, Qingming Huang, Wen Gao","doi":"10.1109/MMSP.2011.6093776","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093776","url":null,"abstract":"Automatic image annotation assigns semantic labels to images thus presents great potential to achieve semantic-aware image retrieval. However, existing annotation algorithms are not scalable to this emerging need, both in terms of computational efficiency and the number of tags they can deal with. Facilitated by recent development of the large-scale image category recognition data such as ImageNet, we extrapolate from it a model for scalable image annotation and semantic-aware image retrieval, namely ObjectBook. The element in the ObjectBook, which is called an ObjectWord, is defined as a collection of discriminative image patches annotated with the corresponding objects. We take ObjectBook as a high-level semantic preserving visual vocabulary, and hence are able to easily develop efficient image annotation and inverted file indexing strategies for large-scale image collections. The proposed retrieval strategy is compared with state-of-the-art algorithms. Experimental results manifest that the ObjectBook is both discriminative and scalable for large-scale semantic-aware image retrieval.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133434519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alberto Corrales-García, José Luis Martínez, G. Fernández-Escribano, F. Quiles, W. Fernando
{"title":"Wyner-Ziv frame parallel decoding based on multicore processors","authors":"Alberto Corrales-García, José Luis Martínez, G. Fernández-Escribano, F. Quiles, W. Fernando","doi":"10.1109/MMSP.2011.6093835","DOIUrl":"https://doi.org/10.1109/MMSP.2011.6093835","url":null,"abstract":"Wyner-Ziv video coding presents a new paradigm which offers low-complexity video encoding. However, the Wyner-Ziv paradigm accumulates high complexity at the decoder side and this could involve difficulties for applications which have delay requisites. On the other hand, technological advances provide us with new hardware which supports parallel data processing. In this paper, a faster Wyner-Ziv video decoding scheme based on multicore processors is proposed. In this way, each frame is decoded by means of the collaboration between several processing units, achieving a time reduction up to 71% without significant rate-distortion drop penalty.","PeriodicalId":214459,"journal":{"name":"2011 IEEE 13th International Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124835858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}