2012 IEEE International Conference on Multimedia and Expo最新文献

筛选
英文 中文
Novel Binaural Spectro-temporal Algorithm for Speech Enhancement in Low SNR Environments 低信噪比环境下语音增强的新型双耳频谱-时间算法
2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.40
Po-Hsun Sung, Bo-Wei Chen, L. Jang, Jhing-Fa Wang
{"title":"Novel Binaural Spectro-temporal Algorithm for Speech Enhancement in Low SNR Environments","authors":"Po-Hsun Sung, Bo-Wei Chen, L. Jang, Jhing-Fa Wang","doi":"10.1109/ICME.2012.40","DOIUrl":"https://doi.org/10.1109/ICME.2012.40","url":null,"abstract":"A novel BInaural Spectro-Temporal (BIST) algorithm is proposed in this paper to increase the speech intelligibility in low or negative SNR noisy environments. The BIST algorithm consists of two modules. One is the spatial mask for receiving sound from the specific direction, and the other is the spectro-temporal modulation filter for noise reduction. Most speech enhancement algorithms are not applicable in harsh environments because the energy of speech is covered by the noise. To increase the speech intelligibility in low or negative SNR noisy environments, a distinctive approach is proposed to solve this problem. First, the BIST algorithm takes binaural auditory processing as a spatial mask to separate the speech and noise according to their locations. Next, the modulation filter is applied to reduce the noise source in the scale-rate (spectro-temporal modulation) domain according to their different acoustic feature. It works like the spectro-temporal receptive field (STRF) which is the perception response of human auditory cortex. The experimental results demonstrate that the proposed BIST speech enhancement algorithm can improve 20% from the noisy speech at SNR-10dB.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123546754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Regression Based Pose Estimation with Automatic Occlusion Detection and Rectification 基于回归的自动遮挡检测和校正姿态估计
2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.160
Ibrahim Radwan, Abhinav Dhall, Jyoti Joshi, Roland Göcke
{"title":"Regression Based Pose Estimation with Automatic Occlusion Detection and Rectification","authors":"Ibrahim Radwan, Abhinav Dhall, Jyoti Joshi, Roland Göcke","doi":"10.1109/ICME.2012.160","DOIUrl":"https://doi.org/10.1109/ICME.2012.160","url":null,"abstract":"Human pose estimation is a classic problem in computer vision. Statistical models based on part-based modelling and the pictorial structure framework have been widely used recently for articulated human pose estimation. However, the performance of these models has been limited due to the presence of self-occlusion. This paper presents a learning-based framework to automatically detect and recover self-occluded body parts. We learn two different models: one for detecting occluded parts in the upper body and another one for the lower body. To solve the key problem of knowing which parts are occluded, we construct Gaussian Process Regression (GPR) models to learn the parameters of the occluded body parts from their corresponding ground truth parameters. Using these models, the pictorial structure of the occluded parts in unseen images is automatically rectified. The proposed framework outperforms a state-of-the-art pictorial structure approach for human pose estimation on 3 different datasets.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123602223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
See-through Image Enhancement through Sensor Fusion 通过传感器融合的透明图像增强
2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.168
Bo Fu, Mao Ye, Ruigang Yang, Cha Zhang
{"title":"See-through Image Enhancement through Sensor Fusion","authors":"Bo Fu, Mao Ye, Ruigang Yang, Cha Zhang","doi":"10.1109/ICME.2012.168","DOIUrl":"https://doi.org/10.1109/ICME.2012.168","url":null,"abstract":"Many hardware designs have been developed to allow a camera to be placed optically directly behind the screen. The purpose of such setups is to enable two-way video teleconferencing that maintains eye-contact. However, the image from the see-through camera usually exhibits a number of imaging artifacts such as low signal to noise ratio, incorrect color balance, and lost of details. We develop a novel image enhancement framework that utilizes an auxiliary color+depth camera that is mounted on the side of the screen. By fusing the information from both cameras, we are able to significantly improve the quality of the see-through image. Experimental results have demonstrated that our fusion method compares favorably against traditional image enhancement/warping methods that uses only a single image.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129414199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Image Classification with Group Fusion Sparse Representation 基于群融合稀疏表示的图像分类
2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.125
Yanan Liu
{"title":"Image Classification with Group Fusion Sparse Representation","authors":"Yanan Liu","doi":"10.1109/ICME.2012.125","DOIUrl":"https://doi.org/10.1109/ICME.2012.125","url":null,"abstract":"In this paper we introduce a novel framework for image classification using local visual descriptors - group fusion sparse representation (GFSR), which casts the classification problem as a linear regression model with sparse constraints of the regression coefficients. Considering the intrinsic discriminative property of prior class label information, and the requirement of local consistency within a class, we add two penalties, one is for sparsity at group level, and the other is for the fusion demand. Experiments on several benchmark image corpora demonstrate that the proposed representation and classification method achieves state-of-the-art accuracy.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128213807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Visual Summarization of the Social Image Collection Using Image Attractiveness Learned from Social Behaviors 利用从社会行为中习得的图像吸引力对社会图像集合进行视觉总结
2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.196
Jin-Woo Jeong, Hyun-Ki Hong, Jee-Uk Heu, Iqbal Qasim, Dong-Ho Lee
{"title":"Visual Summarization of the Social Image Collection Using Image Attractiveness Learned from Social Behaviors","authors":"Jin-Woo Jeong, Hyun-Ki Hong, Jee-Uk Heu, Iqbal Qasim, Dong-Ho Lee","doi":"10.1109/ICME.2012.196","DOIUrl":"https://doi.org/10.1109/ICME.2012.196","url":null,"abstract":"How to effectively summarize a large-scale image collection is still an important and open problem. In this paper, we propose a novel method to effectively generate a summary of the social image collection using image attractiveness learned from the social behaviors conducted in Flickr. To this end, we exploit the note information of Flickr images. The notes of Flickr images are user generated bounding boxes with text annotations assigned on the interesting image regions. Using the visual features extracted from the images that have notes, we have generated the attractiveness models for various concepts. Finally, the attractiveness models are exploited to make a summary of the social image collection. Through various user studies on the image collections from Flickr groups, we show the feasibility of our method and discuss further directions.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129133246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Class-Based Color Bag of Words for Fashion Retrieval 基于类的时尚检索词色包
2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.13
C. Grana, Daniele Borghesani, R. Cucchiara
{"title":"Class-Based Color Bag of Words for Fashion Retrieval","authors":"C. Grana, Daniele Borghesani, R. Cucchiara","doi":"10.1109/ICME.2012.13","DOIUrl":"https://doi.org/10.1109/ICME.2012.13","url":null,"abstract":"Color signatures, histograms and bag of colors are basic and effective strategies for describing the color content of images, for retrieving images by their color appearance or providing color annotation. In some domains, colors assume a specific meaning for users and the color-based classification and retrieval should mirror the initial suggestions given by users in the training set. For instance in fashion world, the names given to the dominant color of a garment or a dress reflect the fashion dictact and not an uniform division of the color space. In this paper we propose a general approach to implement color signature as a trained bag of words, defined on the basis of user defined color classes. The novel Class-based Color Bag of Words is a easy computable bag of words of color, constructed following an approach similar to the Median Cut algorithm, but biased by color distribution in the trained classes. Moreover, to dramatically reduce the computational effort we propose 3D integral histograms, a 3D extension of integral images, easily extensible for many histogram-based signature in 3D color space. Several comparisons in large fashion datasets confirm the discriminant power of this signature.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132410129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A Model Predictive Controller for Frame-Level Rate Control in Multiview Video Coding 多视点视频编码中帧级速率控制的模型预测控制器
2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.69
B. Vizzotto, B. Zatt, M. Shafique, S. Bampi, J. Henkel
{"title":"A Model Predictive Controller for Frame-Level Rate Control in Multiview Video Coding","authors":"B. Vizzotto, B. Zatt, M. Shafique, S. Bampi, J. Henkel","doi":"10.1109/ICME.2012.69","DOIUrl":"https://doi.org/10.1109/ICME.2012.69","url":null,"abstract":"In this work, we present a novel frame-level Rate Control algorithm for Multiview Video Coding encoder that adopts the Model Predictive Control technique in order to provide low bitrate fluctuation and high video quality. Our Model Predictive Rate Control (MPRC) predicts the bitrate for a frame by employing (i) inter-view inter-GOP (Group of Pictures) phase-based bitrate prediction, and (ii) temporal (intra-GOP) target bitrate linear weighting. Moreover, the MPRC also defines an optimal control action through frame-level QP value selection. Experimental results demonstrate that our MPRC bitrate prediction incurs a Mean Bit Estimation Error (MBEE) of 1.13% compared to 2.46% provided by single view-based Rate Control and 1.61% provided by the state-of-the-art MVC Rate Control. Our solution also provides on average 0.876dB BD-PSNR increase and 28.92% BD-Bitrate reduction while providing smoother quality and bitrate variations when compared to state-of-the-art.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125090348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Subjective Crosstalk Assessment Methodology for Auto-stereoscopic Displays 自动立体显示器的主观串扰评价方法
2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.177
Liyuan Xing, Jie Xu, K. Skildheim, A. Perkis, T. Ebrahimi
{"title":"Subjective Crosstalk Assessment Methodology for Auto-stereoscopic Displays","authors":"Liyuan Xing, Jie Xu, K. Skildheim, A. Perkis, T. Ebrahimi","doi":"10.1109/ICME.2012.177","DOIUrl":"https://doi.org/10.1109/ICME.2012.177","url":null,"abstract":"Cross talk is one of the most annoying distortions in the visualization stage of stereoscopic systems. Specifically, both pattern and amount of cross talk in multi-view auto-stereoscopic displays are more complex because of viewing angle dependability, when compared to cross talk in 2-view stereoscopic displays. Regarding system cross talk there are objective measures to assess it in auto-stereoscopic displays. However, in addition to system cross talk, cross talk perceived by users is also impacted by scene content. Moreover, some cross talk is arguably beneficial in auto-stereoscopic displays. Therefore, in this paper, we further assess how cross talk is perceived by users with various scene contents and different viewing positions using auto-stereoscopic displays. In particular, the proposed subjective cross talk assessment methodology is realistic without restriction of the users viewing behavior and is not limited to the specific technique used in auto-stereoscopic displays. The test was performed on a slanted parallax barrier based auto-stereoscopic display. The subjective cross talk assessment results show their consistence to the system cross talk meanwhile more scene content and viewing position related cross talk perception information is provided. This knowledge can be used to design new cross talk perception metrics.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"576 1 Pt 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132675707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An Improved Template-Based Approach to Keyword Spotting Applied to the Spoken Content of User Generated Video Blogs 一种改进的基于模板的关键词识别方法应用于用户生成视频博客的口语内容
2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.10
M. Barakat, C. Ritz, D. Stirling
{"title":"An Improved Template-Based Approach to Keyword Spotting Applied to the Spoken Content of User Generated Video Blogs","authors":"M. Barakat, C. Ritz, D. Stirling","doi":"10.1109/ICME.2012.10","DOIUrl":"https://doi.org/10.1109/ICME.2012.10","url":null,"abstract":"This paper presents a new technique for preparing word templates to improve the performance of dynamic time warping based keyword spotting. The proposed technique selects one reference template from a small set of examples and in contrast to existing model based approaches does not require extensive training. Precision and recall results from applying the technique to template selection for use in searching for keywords in a clean speech database and within a set of user generated video blogs are superior to existing approaches used to select a template. As opposed to automatic speech recognition approaches, the technique is promising for use in searching for keywords that are not adequately represented in training databases.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132827525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Multi-hypothesis Projection-Based Shift Estimation for Sweeping Panorama Reconstruction 基于多假设投影的扫描全景重建偏移估计
2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.38
Tuan Q. Pham, P. Cox
{"title":"Multi-hypothesis Projection-Based Shift Estimation for Sweeping Panorama Reconstruction","authors":"Tuan Q. Pham, P. Cox","doi":"10.1109/ICME.2012.38","DOIUrl":"https://doi.org/10.1109/ICME.2012.38","url":null,"abstract":"Global alignment is an important step in many imaging applications for hand-held cameras. We propose a fast algorithm that can handle large global translations in either x-or y-direction from a pan-tilt camera. The algorithm estimates the translations in x- and y-direction separately using 1D correlation of the absolute gradient projections along the x- and y-axis. Synthetic experiments show that the proposed multiple shift hypotheses approach is robust to translations up to 90% of the image width, whereas other projection-based alignment methods can handle up to 25% only. The proposed approach can also handle larger rotations than other methods. The robustness of the alignment to non-purely translational image motion and moving objects in the scene is demonstrated by a sweeping panorama application on live images from a Canon camera with minimal user interaction.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131789666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信