{"title":"Image compression using adaptive sparse representations over trained dictionaries","authors":"A. Akbari, M. Trocan, B. Granado","doi":"10.1109/MMSP.2016.7813346","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813346","url":null,"abstract":"Sparse representation is a common approach for reducing the spatial redundancy by modelling an image as a linear combination of few atoms taken from an analytic or trained dictionary. This paper introduces a new image codec based on adaptive sparse representations wherein the visual salient information is considered into the rate allocation process. Firstly, the regions of the image that are more conspicuous to the human visual system are extracted using a classical graph-based method. Further, block-based sparse representation over a trained dictionary coupled with an adaptive sparse representation is proposed, such that the adaptivity is achieved by appropriately assigning more atoms of the dictionary to the blocks belonging to the salient regions. Experimental results show that the proposed method outperforms the existing image coding standards, such as JPEG and JPEG2000, which use an analytic dictionary, as well as the state-of-the-art codecs based on trained dictionaries.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134191072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Color-guided depth refinement based on edge alignment","authors":"Hu Tian, Fei Li","doi":"10.1109/MMSP.2016.7813401","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813401","url":null,"abstract":"Depth maps captured by consumer-level depth cameras such as Kinect usually suffer from the problem of corrupted edges and missing depth values. In this paper, an effective approach with the support of guided color images is proposed to tackle this problem. Firstly, an effective two-pass alignment algorithm is used to reliably align the depth edges with color image edges. Then, a new depth map with refined edges is generated based on interpolated drift vectors. Finally, a constrained maximal bilateral filter is proposed to fill the holes. Compared with existing methods, our approach can better refine the depth edges and avoid blurred depths in areas of depth discontinuities, as demonstrated by experiments on real Kinect data.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131917785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Facial expression recognition with dynamic Gabor volume feature","authors":"Junkai Chen, Z. Chi, Hong Fu","doi":"10.1109/MMSP.2016.7813388","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813388","url":null,"abstract":"Facial expression recognition is a long standing problem in affective computing community. A key step is extracting effective features from face images. Gabor filters have been widely used for this purpose. However, a big challenge for Gabor filters is its high dimensionality. In this paper, we propose an efficient feature called dynamic Gabor volume feature (DGVF) based on Gabor filters while with a lower dimensionality for facial expression recognition. In our approach, we first apply Gabor filters with multi-scale and multi-orientation to extract different Gabor faces. And these Gabor faces are arranged into a 3-D volume and Histograms of Oriented Gradients from Three Orthogonal Planes (HOG-TOP) are further employed to encode the 3-D volume in a compact way. Finally, SVM is trained to perform the classification. The experiments conducted on the Extended Cohn-Kanade (CK+) Dataset show that the proposed DGVF is robust to capture and represent the facial appearance features. And our method also achieves a superior performance compared with the other state-of-the-art methods.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122995143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Saliency in objective video quality assessment: What is the ground truth?","authors":"Wei Zhang, Hantao Liu","doi":"10.1109/MMSP.2016.7813333","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813333","url":null,"abstract":"Finding ways to be able to objectively and reliably assess video quality as would be perceived by humans has become a pressing concern in the multimedia community. To enhance the performance of video quality metrics (VQMs), a research trend is to incorporate visual saliency aspects. Existing approaches have focused on utilizing a computational saliency model to improve a VQM. Since saliency models still remain limited in predicting where people look in videos, the benefits of inclusion of saliency in VQMs may heavily depend on the accuracy of the saliency model used. To gain an insight into the actual added value of saliency in VQMs, ground truth saliency obtained from eye-tracking instead of computational saliency is an essential prerequisite. However, collecting eye-tracking data within the context of video quality is confronted with a bias due to the involvement of massive stimulus repetition. In this paper, we introduce a new experimental methodology to alleviate such potential bias and consequently, to be able to deliver reliable intended data. We recorded eye movements from 160 human observers while they freely viewed 160 video stimuli distorted with different distortion types at various degradation levels. We analyse the extent to which ground truth saliency as well as computational saliency actually benefit existing state of the art VQMs. Our dataset opens new challenges for saliency modelling in video quality research and helps better gauge progress in developing saliency-based VQMs.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121658573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparison of feature-level and kernel-level data fusion methods in multi-sensory fall detection","authors":"Che-Wei Huang, Shrikanth S. Narayanan","doi":"10.1109/MMSP.2016.7813383","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813383","url":null,"abstract":"In this work, we studied the problem of fall detection using signals from tri-axial wearable sensors. In particular, we focused on the comparison of methods to combine signals from multiple tri-axial accelerometers which were attached to different body parts in order to recognize human activities. To improve the detection rate while maintaining a low false alarm rate, previous studies developed detection algorithms by cascading base algorithms and experimented on each sensory data separately. Rather than combining base algorithms, we explored the combination of multiple data sources. Based on the hypothesis that these sensor signals should provide complementary information to the characterization of human's physical activities, we benchmarked a feature level and a kernel-level fusions to learn the kernel that incorporates multiple sensors in the support vector classifier. The results show that given the same false alarm rate constraint, the detection rate improves when using signals from multiple sensors, compared to the baseline where no fusion was employed.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117276323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust propagated filtering with applications to image texture filtering and beyond","authors":"Hsin-Yuan Dennis Wen, Y. Wang","doi":"10.1109/MMSP.2016.7813341","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813341","url":null,"abstract":"Extracting meaningful structures from an image is an important task and benefits a wide range of image application tasks. However, it is typically very challenging to distinguish between noisy or textural patterns from image structures, especially when such patterns do not exhibit regularity (e.g., irregular textural patterns or those with varying scales). While existing edge-preserving image filters like bilateral, guided, or propagation filters aim at observing strong image edges, they cannot be easily applied to solve the above texture filtering tasks. In this paper, we propose robust propagated filter, which is an extension to propagation filters while exhibiting excellent ability in eliminating the aforementioned textural patterns when performing filtering. We show in our experimental results that our filter provides promising results on image filtering. Additional experiments on inverse image half toning and detail enhancement further verify the effectiveness of our proposed method.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115236255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biao Wang, M. Alvarez-Mesa, C. C. Chi, B. Juurlink, D. Souza, A. Ilic, N. Roma, L. Sousa
{"title":"Efficient HEVC decoder for heterogeneous CPU with GPU systems","authors":"Biao Wang, M. Alvarez-Mesa, C. C. Chi, B. Juurlink, D. Souza, A. Ilic, N. Roma, L. Sousa","doi":"10.1109/MMSP.2016.7813353","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813353","url":null,"abstract":"The High Efficiency Video Coding (HEVC) standard provides higher compression efficiency than other video coding standards but at the cost of increased computational load, which makes it hard to achieve real-time encoding/decoding of high-resolution, high-quality video sequences. In this paper, we investigate how Graphics Processing Units (GPUs) can be employed to accelerate HEVC decoding. GPUs are known to provide massive processing capability for throughput computing kernels, but the HEVC entropy decoding kernel cannot be executed efficiently on GPUs. We therefore propose a complete HEVC decoding solution for heterogeneous CPU+GPU systems, in which the entropy decoder is executed on the CPU and the remaining kernels on the GPU. Furthermore, the decoder is pipelined such that the CPU and the GPU can decode different frames in parallel. The proposed CPU+GPU decoder achieves an average frame rate of 150 frames per second for Ultra HD 4K video sequences when four CPU cores are used with an NVIDIA GeForce Titan X GPU.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127201026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Laughter detection based on the fusion of local binary patterns, spectral and prosodic features","authors":"Stefany Bedoya, T. Falk","doi":"10.1109/MMSP.2016.7813391","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813391","url":null,"abstract":"Today, great focus has been placed on context-aware human-machine interaction, where systems are aware not only of the surrounding environment, but also about the mental/affective state of the user. Such knowledge can allow for the interaction to become more human-like. To this end, automatic discrimination between laughter and speech has emerged as an interesting, yet challenging problem. Typically, audio-or video-based methods have been proposed in the literature; humans, however, are known to integrate both sensory modalities during conversation and/or interaction. As such, this paper explores the fusion of support vector machine classifiers trained on local binary pattern (LBP) video features, as well as speech spectral and prosodic features as a way of improving laughter detection performance. Experimental results on the publicly-available MAHNOB Laughter database show that the proposed audio-visual fusion scheme can achieve a laughter detection accuracy of 93.3%, thus outperforming systems trained on audio or visual features alone.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114255694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Content-adaptive non-parametric texture similarity measure","authors":"M. Alfarraj, Yazeed Alaudah, G. Al-Regib","doi":"10.1109/MMSP.2016.7813338","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813338","url":null,"abstract":"In this paper, we introduce a non-parametric texture similarity measure based on the singular value decomposition of the curvelet coefficients followed by a content-based truncation of the singular values. This measure focuses on images with repeating structures and directional content such as those found in natural texture images. Such textural content is critical for image perception and its similarity plays a vital role in various computer vision applications. In this paper, we evaluate the effectiveness of the proposed measure using a retrieval experiment. The proposed measure outperforms the state-of-the-art texture similarity metrics on CUReT and PerTex texture databases, respectively.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132194888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new AL-FEC coding scheme with limited feedback","authors":"Wei Huang, Hao Chen, Yiling Xu, Zhu Li, Wenjun Zhang","doi":"10.1109/MMSP.2016.7813360","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813360","url":null,"abstract":"For the next generation mobile video broadcasting, especially in-band solutions that serves the mobile devices, a limited feedback scheme via cellular channel polling is feasible to give accurate real-time information on the broadcast receivers' channel erasure rate, and decoding buffer status. In this work, we propose an AL-FEC coding degree scheme based on this feedback, to achieve a better decode efficiency and save the code redundancy. Simulation results demonstrate the effectiveness of this solution, and open up new opportunities in the next generation broadcasting system design.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133950419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}