2012 IEEE International Conference on Multimedia and Expo最新文献_第3页

Novel Binaural Spectro-temporal Algorithm for Speech Enhancement in Low SNR Environments 低信噪比环境下语音增强的新型双耳频谱-时间算法

2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.40

Po-Hsun Sung, Bo-Wei Chen, L. Jang, Jhing-Fa Wang

{"title":"Novel Binaural Spectro-temporal Algorithm for Speech Enhancement in Low SNR Environments","authors":"Po-Hsun Sung, Bo-Wei Chen, L. Jang, Jhing-Fa Wang","doi":"10.1109/ICME.2012.40","DOIUrl":"https://doi.org/10.1109/ICME.2012.40","url":null,"abstract":"A novel BInaural Spectro-Temporal (BIST) algorithm is proposed in this paper to increase the speech intelligibility in low or negative SNR noisy environments. The BIST algorithm consists of two modules. One is the spatial mask for receiving sound from the specific direction, and the other is the spectro-temporal modulation filter for noise reduction. Most speech enhancement algorithms are not applicable in harsh environments because the energy of speech is covered by the noise. To increase the speech intelligibility in low or negative SNR noisy environments, a distinctive approach is proposed to solve this problem. First, the BIST algorithm takes binaural auditory processing as a spatial mask to separate the speech and noise according to their locations. Next, the modulation filter is applied to reduce the noise source in the scale-rate (spectro-temporal modulation) domain according to their different acoustic feature. It works like the spectro-temporal receptive field (STRF) which is the perception response of human auditory cortex. The experimental results demonstrate that the proposed BIST speech enhancement algorithm can improve 20% from the noisy speech at SNR-10dB.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123546754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Regression Based Pose Estimation with Automatic Occlusion Detection and Rectification 基于回归的自动遮挡检测和校正姿态估计

2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.160

Ibrahim Radwan, Abhinav Dhall, Jyoti Joshi, Roland Göcke

引用次数: 11

See-through Image Enhancement through Sensor Fusion 通过传感器融合的透明图像增强

2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.168

Bo Fu, Mao Ye, Ruigang Yang, Cha Zhang

引用次数: 1

Image Classification with Group Fusion Sparse Representation 基于群融合稀疏表示的图像分类

2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.125

Yanan Liu

引用次数: 1

Visual Summarization of the Social Image Collection Using Image Attractiveness Learned from Social Behaviors 利用从社会行为中习得的图像吸引力对社会图像集合进行视觉总结

2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.196

Jin-Woo Jeong, Hyun-Ki Hong, Jee-Uk Heu, Iqbal Qasim, Dong-Ho Lee

引用次数: 6

Class-Based Color Bag of Words for Fashion Retrieval 基于类的时尚检索词色包

2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.13

C. Grana, Daniele Borghesani, R. Cucchiara

{"title":"Class-Based Color Bag of Words for Fashion Retrieval","authors":"C. Grana, Daniele Borghesani, R. Cucchiara","doi":"10.1109/ICME.2012.13","DOIUrl":"https://doi.org/10.1109/ICME.2012.13","url":null,"abstract":"Color signatures, histograms and bag of colors are basic and effective strategies for describing the color content of images, for retrieving images by their color appearance or providing color annotation. In some domains, colors assume a specific meaning for users and the color-based classification and retrieval should mirror the initial suggestions given by users in the training set. For instance in fashion world, the names given to the dominant color of a garment or a dress reflect the fashion dictact and not an uniform division of the color space. In this paper we propose a general approach to implement color signature as a trained bag of words, defined on the basis of user defined color classes. The novel Class-based Color Bag of Words is a easy computable bag of words of color, constructed following an approach similar to the Median Cut algorithm, but biased by color distribution in the trained classes. Moreover, to dramatically reduce the computational effort we propose 3D integral histograms, a 3D extension of integral images, easily extensible for many histogram-based signature in 3D color space. Several comparisons in large fashion datasets confirm the discriminant power of this signature.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132410129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

A Model Predictive Controller for Frame-Level Rate Control in Multiview Video Coding 多视点视频编码中帧级速率控制的模型预测控制器

2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.69

B. Vizzotto, B. Zatt, M. Shafique, S. Bampi, J. Henkel

引用次数: 11

Subjective Crosstalk Assessment Methodology for Auto-stereoscopic Displays 自动立体显示器的主观串扰评价方法

2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.177

Liyuan Xing, Jie Xu, K. Skildheim, A. Perkis, T. Ebrahimi

{"title":"Subjective Crosstalk Assessment Methodology for Auto-stereoscopic Displays","authors":"Liyuan Xing, Jie Xu, K. Skildheim, A. Perkis, T. Ebrahimi","doi":"10.1109/ICME.2012.177","DOIUrl":"https://doi.org/10.1109/ICME.2012.177","url":null,"abstract":"Cross talk is one of the most annoying distortions in the visualization stage of stereoscopic systems. Specifically, both pattern and amount of cross talk in multi-view auto-stereoscopic displays are more complex because of viewing angle dependability, when compared to cross talk in 2-view stereoscopic displays. Regarding system cross talk there are objective measures to assess it in auto-stereoscopic displays. However, in addition to system cross talk, cross talk perceived by users is also impacted by scene content. Moreover, some cross talk is arguably beneficial in auto-stereoscopic displays. Therefore, in this paper, we further assess how cross talk is perceived by users with various scene contents and different viewing positions using auto-stereoscopic displays. In particular, the proposed subjective cross talk assessment methodology is realistic without restriction of the users viewing behavior and is not limited to the specific technique used in auto-stereoscopic displays. The test was performed on a slanted parallax barrier based auto-stereoscopic display. The subjective cross talk assessment results show their consistence to the system cross talk meanwhile more scene content and viewing position related cross talk perception information is provided. This knowledge can be used to design new cross talk perception metrics.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"576 1 Pt 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132675707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

An Improved Template-Based Approach to Keyword Spotting Applied to the Spoken Content of User Generated Video Blogs 一种改进的基于模板的关键词识别方法应用于用户生成视频博客的口语内容

2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.10

M. Barakat, C. Ritz, D. Stirling

引用次数: 8

Multi-hypothesis Projection-Based Shift Estimation for Sweeping Panorama Reconstruction 基于多假设投影的扫描全景重建偏移估计

2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI: 10.1109/ICME.2012.38

Tuan Q. Pham, P. Cox

引用次数: 0