{"title":"Novel Binaural Spectro-temporal Algorithm for Speech Enhancement in Low SNR Environments","authors":"Po-Hsun Sung, Bo-Wei Chen, L. Jang, Jhing-Fa Wang","doi":"10.1109/ICME.2012.40","DOIUrl":"https://doi.org/10.1109/ICME.2012.40","url":null,"abstract":"A novel BInaural Spectro-Temporal (BIST) algorithm is proposed in this paper to increase the speech intelligibility in low or negative SNR noisy environments. The BIST algorithm consists of two modules. One is the spatial mask for receiving sound from the specific direction, and the other is the spectro-temporal modulation filter for noise reduction. Most speech enhancement algorithms are not applicable in harsh environments because the energy of speech is covered by the noise. To increase the speech intelligibility in low or negative SNR noisy environments, a distinctive approach is proposed to solve this problem. First, the BIST algorithm takes binaural auditory processing as a spatial mask to separate the speech and noise according to their locations. Next, the modulation filter is applied to reduce the noise source in the scale-rate (spectro-temporal modulation) domain according to their different acoustic feature. It works like the spectro-temporal receptive field (STRF) which is the perception response of human auditory cortex. The experimental results demonstrate that the proposed BIST speech enhancement algorithm can improve 20% from the noisy speech at SNR-10dB.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123546754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ibrahim Radwan, Abhinav Dhall, Jyoti Joshi, Roland Göcke
{"title":"Regression Based Pose Estimation with Automatic Occlusion Detection and Rectification","authors":"Ibrahim Radwan, Abhinav Dhall, Jyoti Joshi, Roland Göcke","doi":"10.1109/ICME.2012.160","DOIUrl":"https://doi.org/10.1109/ICME.2012.160","url":null,"abstract":"Human pose estimation is a classic problem in computer vision. Statistical models based on part-based modelling and the pictorial structure framework have been widely used recently for articulated human pose estimation. However, the performance of these models has been limited due to the presence of self-occlusion. This paper presents a learning-based framework to automatically detect and recover self-occluded body parts. We learn two different models: one for detecting occluded parts in the upper body and another one for the lower body. To solve the key problem of knowing which parts are occluded, we construct Gaussian Process Regression (GPR) models to learn the parameters of the occluded body parts from their corresponding ground truth parameters. Using these models, the pictorial structure of the occluded parts in unseen images is automatically rectified. The proposed framework outperforms a state-of-the-art pictorial structure approach for human pose estimation on 3 different datasets.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123602223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"See-through Image Enhancement through Sensor Fusion","authors":"Bo Fu, Mao Ye, Ruigang Yang, Cha Zhang","doi":"10.1109/ICME.2012.168","DOIUrl":"https://doi.org/10.1109/ICME.2012.168","url":null,"abstract":"Many hardware designs have been developed to allow a camera to be placed optically directly behind the screen. The purpose of such setups is to enable two-way video teleconferencing that maintains eye-contact. However, the image from the see-through camera usually exhibits a number of imaging artifacts such as low signal to noise ratio, incorrect color balance, and lost of details. We develop a novel image enhancement framework that utilizes an auxiliary color+depth camera that is mounted on the side of the screen. By fusing the information from both cameras, we are able to significantly improve the quality of the see-through image. Experimental results have demonstrated that our fusion method compares favorably against traditional image enhancement/warping methods that uses only a single image.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129414199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image Classification with Group Fusion Sparse Representation","authors":"Yanan Liu","doi":"10.1109/ICME.2012.125","DOIUrl":"https://doi.org/10.1109/ICME.2012.125","url":null,"abstract":"In this paper we introduce a novel framework for image classification using local visual descriptors - group fusion sparse representation (GFSR), which casts the classification problem as a linear regression model with sparse constraints of the regression coefficients. Considering the intrinsic discriminative property of prior class label information, and the requirement of local consistency within a class, we add two penalties, one is for sparsity at group level, and the other is for the fusion demand. Experiments on several benchmark image corpora demonstrate that the proposed representation and classification method achieves state-of-the-art accuracy.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128213807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jin-Woo Jeong, Hyun-Ki Hong, Jee-Uk Heu, Iqbal Qasim, Dong-Ho Lee
{"title":"Visual Summarization of the Social Image Collection Using Image Attractiveness Learned from Social Behaviors","authors":"Jin-Woo Jeong, Hyun-Ki Hong, Jee-Uk Heu, Iqbal Qasim, Dong-Ho Lee","doi":"10.1109/ICME.2012.196","DOIUrl":"https://doi.org/10.1109/ICME.2012.196","url":null,"abstract":"How to effectively summarize a large-scale image collection is still an important and open problem. In this paper, we propose a novel method to effectively generate a summary of the social image collection using image attractiveness learned from the social behaviors conducted in Flickr. To this end, we exploit the note information of Flickr images. The notes of Flickr images are user generated bounding boxes with text annotations assigned on the interesting image regions. Using the visual features extracted from the images that have notes, we have generated the attractiveness models for various concepts. Finally, the attractiveness models are exploited to make a summary of the social image collection. Through various user studies on the image collections from Flickr groups, we show the feasibility of our method and discuss further directions.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129133246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Class-Based Color Bag of Words for Fashion Retrieval","authors":"C. Grana, Daniele Borghesani, R. Cucchiara","doi":"10.1109/ICME.2012.13","DOIUrl":"https://doi.org/10.1109/ICME.2012.13","url":null,"abstract":"Color signatures, histograms and bag of colors are basic and effective strategies for describing the color content of images, for retrieving images by their color appearance or providing color annotation. In some domains, colors assume a specific meaning for users and the color-based classification and retrieval should mirror the initial suggestions given by users in the training set. For instance in fashion world, the names given to the dominant color of a garment or a dress reflect the fashion dictact and not an uniform division of the color space. In this paper we propose a general approach to implement color signature as a trained bag of words, defined on the basis of user defined color classes. The novel Class-based Color Bag of Words is a easy computable bag of words of color, constructed following an approach similar to the Median Cut algorithm, but biased by color distribution in the trained classes. Moreover, to dramatically reduce the computational effort we propose 3D integral histograms, a 3D extension of integral images, easily extensible for many histogram-based signature in 3D color space. Several comparisons in large fashion datasets confirm the discriminant power of this signature.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132410129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Vizzotto, B. Zatt, M. Shafique, S. Bampi, J. Henkel
{"title":"A Model Predictive Controller for Frame-Level Rate Control in Multiview Video Coding","authors":"B. Vizzotto, B. Zatt, M. Shafique, S. Bampi, J. Henkel","doi":"10.1109/ICME.2012.69","DOIUrl":"https://doi.org/10.1109/ICME.2012.69","url":null,"abstract":"In this work, we present a novel frame-level Rate Control algorithm for Multiview Video Coding encoder that adopts the Model Predictive Control technique in order to provide low bitrate fluctuation and high video quality. Our Model Predictive Rate Control (MPRC) predicts the bitrate for a frame by employing (i) inter-view inter-GOP (Group of Pictures) phase-based bitrate prediction, and (ii) temporal (intra-GOP) target bitrate linear weighting. Moreover, the MPRC also defines an optimal control action through frame-level QP value selection. Experimental results demonstrate that our MPRC bitrate prediction incurs a Mean Bit Estimation Error (MBEE) of 1.13% compared to 2.46% provided by single view-based Rate Control and 1.61% provided by the state-of-the-art MVC Rate Control. Our solution also provides on average 0.876dB BD-PSNR increase and 28.92% BD-Bitrate reduction while providing smoother quality and bitrate variations when compared to state-of-the-art.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125090348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liyuan Xing, Jie Xu, K. Skildheim, A. Perkis, T. Ebrahimi
{"title":"Subjective Crosstalk Assessment Methodology for Auto-stereoscopic Displays","authors":"Liyuan Xing, Jie Xu, K. Skildheim, A. Perkis, T. Ebrahimi","doi":"10.1109/ICME.2012.177","DOIUrl":"https://doi.org/10.1109/ICME.2012.177","url":null,"abstract":"Cross talk is one of the most annoying distortions in the visualization stage of stereoscopic systems. Specifically, both pattern and amount of cross talk in multi-view auto-stereoscopic displays are more complex because of viewing angle dependability, when compared to cross talk in 2-view stereoscopic displays. Regarding system cross talk there are objective measures to assess it in auto-stereoscopic displays. However, in addition to system cross talk, cross talk perceived by users is also impacted by scene content. Moreover, some cross talk is arguably beneficial in auto-stereoscopic displays. Therefore, in this paper, we further assess how cross talk is perceived by users with various scene contents and different viewing positions using auto-stereoscopic displays. In particular, the proposed subjective cross talk assessment methodology is realistic without restriction of the users viewing behavior and is not limited to the specific technique used in auto-stereoscopic displays. The test was performed on a slanted parallax barrier based auto-stereoscopic display. The subjective cross talk assessment results show their consistence to the system cross talk meanwhile more scene content and viewing position related cross talk perception information is provided. This knowledge can be used to design new cross talk perception metrics.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"576 1 Pt 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132675707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Improved Template-Based Approach to Keyword Spotting Applied to the Spoken Content of User Generated Video Blogs","authors":"M. Barakat, C. Ritz, D. Stirling","doi":"10.1109/ICME.2012.10","DOIUrl":"https://doi.org/10.1109/ICME.2012.10","url":null,"abstract":"This paper presents a new technique for preparing word templates to improve the performance of dynamic time warping based keyword spotting. The proposed technique selects one reference template from a small set of examples and in contrast to existing model based approaches does not require extensive training. Precision and recall results from applying the technique to template selection for use in searching for keywords in a clean speech database and within a set of user generated video blogs are superior to existing approaches used to select a template. As opposed to automatic speech recognition approaches, the technique is promising for use in searching for keywords that are not adequately represented in training databases.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132827525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-hypothesis Projection-Based Shift Estimation for Sweeping Panorama Reconstruction","authors":"Tuan Q. Pham, P. Cox","doi":"10.1109/ICME.2012.38","DOIUrl":"https://doi.org/10.1109/ICME.2012.38","url":null,"abstract":"Global alignment is an important step in many imaging applications for hand-held cameras. We propose a fast algorithm that can handle large global translations in either x-or y-direction from a pan-tilt camera. The algorithm estimates the translations in x- and y-direction separately using 1D correlation of the absolute gradient projections along the x- and y-axis. Synthetic experiments show that the proposed multiple shift hypotheses approach is robust to translations up to 90% of the image width, whereas other projection-based alignment methods can handle up to 25% only. The proposed approach can also handle larger rotations than other methods. The robustness of the alignment to non-purely translational image motion and moving objects in the scene is demonstrated by a sweeping panorama application on live images from a Canon camera with minimal user interaction.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131789666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}