2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)最新文献

筛选
英文 中文
On the enhancement of dereverberation algorithms using multiple perceptual-evaluation criteria 基于多感知评价标准的去噪算法的改进
2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP) Pub Date : 2016-09-01 DOI: 10.1109/MMSP.2016.7813373
Rafael Zambrano-Lopez, T. Prego, A. Lima, S. L. Netto
{"title":"On the enhancement of dereverberation algorithms using multiple perceptual-evaluation criteria","authors":"Rafael Zambrano-Lopez, T. Prego, A. Lima, S. L. Netto","doi":"10.1109/MMSP.2016.7813373","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813373","url":null,"abstract":"This paper describes an enhancement strategy based on several perceptual-assessment criteria for dereverberation algorithms. The complete procedure is applied to an algorithm for reverberant speech enhancement based on single-channel blind spectral subtraction. This enhancement was implemented by combining different quality measures, namely the so-called QAreverb, the speech-to-reverberation modulation energy ratio (SRMR) and the perceptual evaluation of speech quality (PESQ). Experimental results, using a 4211-signal speech database, indicate that the proposed modifications can improve the word error rate (WER) of speech recognition systems an average of 20%.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"519 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123352146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust sound event classification by using denoising autoencoder 基于去噪自编码器的鲁棒声音事件分类
2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP) Pub Date : 2016-09-01 DOI: 10.1109/MMSP.2016.7813376
Jianchao Zhou, Liqun Peng, Xiaoou Chen, Deshun Yang
{"title":"Robust sound event classification by using denoising autoencoder","authors":"Jianchao Zhou, Liqun Peng, Xiaoou Chen, Deshun Yang","doi":"10.1109/MMSP.2016.7813376","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813376","url":null,"abstract":"Over the last decade, a lot of research has been done on sound event classification. But a main problem with sound event classification is that the performance sharply degrades in the presence of noise. As spectrogram-based image features and denoising auto encoder reportedly have superior performance in noisy conditions, this paper proposes a new robust feature called denoising auto encoder image feature (DIF) for sound event classification which is an image feature extracted from an image-like representation produced by denoising auto encoder. Performance of the feature is evaluated by a classification experiment using a SVM classifier on audio examples with different noise levels, and compared with that of baseline features including mel-frequency cepstral coefficients (MFCC) and spectrogram image feature. The proposed DIF demonstrates better performance under noise-corrupted conditions.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129076116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Perceptual video quality assessment: Spatiotemporal pooling strategies for different distortions and visual maps 感知视频质量评估:不同失真和视觉地图的时空池策略
2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP) Pub Date : 2016-09-01 DOI: 10.1109/MMSP.2016.7813336
Mohammed A. Aabed, G. Al-Regib
{"title":"Perceptual video quality assessment: Spatiotemporal pooling strategies for different distortions and visual maps","authors":"Mohammed A. Aabed, G. Al-Regib","doi":"10.1109/MMSP.2016.7813336","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813336","url":null,"abstract":"In this paper, we investigate the challenge of distortion map feature selection and spatiotemporal pooling in perceptual video quality assessment (PVQA). We analyze three distortion maps representing different visual features spatially and temporally: squared error, local pixel-level SSIM, and absolute difference of optical flow magnitudes. We examine the performance of each of these maps with different spatial and temporal pooling strategies across three databases. We identify the most effective statistical pooling strategies spatially and temporally with respect to PVQA. We also show the most significant spatial and temporal features correlated with perception for every distortion/feature map. Our results show that varying the pooling strategy and distortion maps yields a significant improvement in perceptual quality estimation. We also deduce insights from our results to better understand the sensitivity of human vision to distortions. We aim for these findings to provide perceptual cues and guidelines to researchers during metric design, perceptual feature selection, HVS modeling and pooling selection/optimization. We further show that the same distortions across databases can yield different results in terms of PVQA evaluation and verification.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130087249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Face video based touchless blood pressure and heart rate estimation 基于面部视频的非接触式血压和心率估计
2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP) Pub Date : 2016-09-01 DOI: 10.1109/MMSP.2016.7813389
Monika Jain, Sujay Deb, A. Subramanyam
{"title":"Face video based touchless blood pressure and heart rate estimation","authors":"Monika Jain, Sujay Deb, A. Subramanyam","doi":"10.1109/MMSP.2016.7813389","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813389","url":null,"abstract":"Hypertension (high blood pressure) is the leading cause for increasing number of premature deaths due to cardiovascular diseases. Continuous hypertension screening seems to be a promising approach in order to take appropriate steps to alleviate hypertension-related diseases. Many studies have shown that physiological signal like Photoplethysmogram (PPG) can be reliably used for predicting the Blood Pressure (BP) and Heart Rate (HR). However, the existing approaches use a transmission or reflective type wearable sensor to collect the PPG signal. These sensors are bulky and mostly require an assistance of a trained medical practitioner; which preclude these approaches from continuous BP monitoring outside the medical centers. In this paper, we propose a novel touchless approach that predicts BP and HR using the face video based PPG. Since the facial video can easily be captured using a consumer grade camera, this approach is a convenient way for continuous hypertension monitoring outside the medical centers. The approach is validated using the face video data collected in our lab, with the ground truth BP and HR measured using a clinically approved BP monitor OMRON HBP1300. Accuracy of the method is measured in terms of normalized mean square error, mean absolute error and error standard deviation; which complies with the standards mentioned by Association for the Advancement of Medical Instrumentation. Two-tailed dependent sample t-test is also conducted to verify that there is no statistically significant difference between the BP and HR predicted using the proposed approach and the BP and HR measured using OMRON.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116298706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Audiovisual quality study for videoconferencing on IP networks IP网络视频会议的视听质量研究
2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP) Pub Date : 2016-09-01 DOI: 10.1109/MMSP.2016.7813379
Ines Saidi, Lu Zhang, Vincent Barriac, O. Déforges
{"title":"Audiovisual quality study for videoconferencing on IP networks","authors":"Ines Saidi, Lu Zhang, Vincent Barriac, O. Déforges","doi":"10.1109/MMSP.2016.7813379","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813379","url":null,"abstract":"In this paper, an audiovisual quality assessment experiment was conducted on audiovisual clips collected using a PC-based videoconferencing application connected via a local IP network. The analyses of experimental results provided a better understanding of the influence of network impairments (packet loss, jitter, delay) on perceived audio and video qualities, as well as their interaction effect on the overall audiovisual quality in videoconferencing applications. We updated the human perception acceptability limits of audio-video synchronization for video conferencing. Further, we investigated the contribution of this synchronization to the audiovisual quality independently and accompanied with network impairments. Finally, we proposed an integration model to estimate the audiovisual quality in the studied context.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126012689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A study of the perceptual relevance of the burst phase of stop consonants with implications in speech coding 顿音爆发相位的知觉相关性与语音编码的意义研究
2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP) Pub Date : 2016-09-01 DOI: 10.1109/MMSP.2016.7813374
Vincent Santini, P. Gournay, R. Lefebvre
{"title":"A study of the perceptual relevance of the burst phase of stop consonants with implications in speech coding","authors":"Vincent Santini, P. Gournay, R. Lefebvre","doi":"10.1109/MMSP.2016.7813374","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813374","url":null,"abstract":"Stop consonants are an important constituent of the speech signal. They contribute significantly to its intelligibility and subjective quality. However, because of their dynamic and unpredictable nature, they tend to be difficult to encode using conventional approaches such as linear predictive coding and transform coding. This paper presents a system to detect, segment, and modify stop consonants in a speech signal. This system is then used to assess the following hypothesis: Muting the burst phase of stop consonants has a negligible impact on the subjective quality of speech. The muting operation is implemented and its impact on subjective quality is evaluated on a database of speech signals. The results show that this apparently drastic alteration has in reality very little perceptual impact. The implications for speech coding are then discussed.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"PP 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126709610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A drift compensated reversible watermarking scheme for H.265/HEVC 一种H.265/HEVC的漂移补偿可逆水印方案
2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP) Pub Date : 2016-09-01 DOI: 10.1109/MMSP.2016.7813358
S. Gaj, Shuvendu Rana, A. Sur, P. Bora
{"title":"A drift compensated reversible watermarking scheme for H.265/HEVC","authors":"S. Gaj, Shuvendu Rana, A. Sur, P. Bora","doi":"10.1109/MMSP.2016.7813358","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813358","url":null,"abstract":"In this paper, a compressed domain drift compensated reversible watermarking scheme is proposed with a high embedding capacity and the least amount of visual quality degradation for H.265/HEVC videos. Using compressed domain syntax elements, such as motion vector and transformed residual, a set of 4 × 4 Transform Blocks (TB) of similar texture are chosen from consecutive I Frames for watermark embedding. Due to texture similarity of these selected TBs, the differences between the transformed coefficients are equal or close to zero. Utilizing this difference statistics, a multilevel watermarking is inserted in the compressed video by altering near zeros values in the difference transformed coefficients. A comprehensive set of experiments have been carried out to justify the efficacy of the proposed scheme over existing literature.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126825295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Fast mode decision for HEVC intra coding with efficient mode skipping and improved RMD 基于高效模式跳转和改进RMD的HEVC编码快速模式决策
2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP) Pub Date : 2016-09-01 DOI: 10.1109/MMSP.2016.7813355
Xin Lu, Nan Xiao, Yue Hu, Zhilu Wu, G. Martin
{"title":"Fast mode decision for HEVC intra coding with efficient mode skipping and improved RMD","authors":"Xin Lu, Nan Xiao, Yue Hu, Zhilu Wu, G. Martin","doi":"10.1109/MMSP.2016.7813355","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813355","url":null,"abstract":"HEVC employs a quad-tree based Coding Unit (CU) structure to achieve a significant improvement in coding efficiency compared with previous standards. However, the computational complexity is greatly increased. We proposed a fast mode decision algorithm to reduce intra coding complexity. Firstly, an initial candidate list of intra modes is constructed for each Prediction Unit (PU). The prediction mode correlation between adjacent quad-tree coding levels and between temporal neighbouring frames is used to predict the most likely coding mode. The number of prediction mode that need to be evaluated in residual quad-tree (RQT) process is further reduced by taking the Hadamard cost of prediction mode into consideration. Simulation results show that the proposed algorithm saves encoding time by up to 51% compared with the HM 13.0 implementation, while having a negligible impact on rate distortion.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130458022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A quantitative real time data analysis in vehicular speech environment with varying SNR 不同信噪比下车载语音环境的实时定量数据分析
2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP) Pub Date : 2016-09-01 DOI: 10.1109/MMSP.2016.7813375
Sai Prithvi Gadde, Sam Tabaja, Philip Olivier, N. Jaber, Mahdi Ali, R. Chabaan, Scott Bone
{"title":"A quantitative real time data analysis in vehicular speech environment with varying SNR","authors":"Sai Prithvi Gadde, Sam Tabaja, Philip Olivier, N. Jaber, Mahdi Ali, R. Chabaan, Scott Bone","doi":"10.1109/MMSP.2016.7813375","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813375","url":null,"abstract":"The purpose of this paper is to compare the performance of two common filters operating on noisy speech recorded in automobiles travelling at various speeds. The filters are based on Spectral Subtraction (SS) and Kalman Filtering (KF). The literature contains studies based on simulated data whereas this paper uses real time data collected in car's in search of an optimal solution. The comparisons were based on real recorded samples containing noisy speech signals with durations of approximately 2 minutes each. Different cases of noise levels which represent the most common situations experienced by drivers were created. The different settings used include varying car speeds (e.g., 40 mph, 70 mph), varying fan power, and window positions settings. The study was carried out using three different car models. The measured noisy voice signals were filtered using the different filtering techniques and the resulting filtered signals were compared in the time domain and the frequency domain, both quantitatively and psychometrically. Furthermore, the quantitative analysis approach was applied to the results for more accurate interpretation. Results show that SS outperforms KF in noise reduction, and with much less speech distortion at the different Signal to Noise Ratios (SNRs) tested. The audio test results subjected to human listening are comparable with the simulation results. Overall, SS showed superior performance over KF in vehicular hands-free speech applications.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131670397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Image coding using parametric texture synthesis 图像编码使用参数纹理合成
2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP) Pub Date : 2016-09-01 DOI: 10.1109/MMSP.2016.7813339
Uday Singh Thakur, Bappaditya Ray
{"title":"Image coding using parametric texture synthesis","authors":"Uday Singh Thakur, Bappaditya Ray","doi":"10.1109/MMSP.2016.7813339","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813339","url":null,"abstract":"Visual textures like grass, water etc. consist of dense and random variations in contrast that are perceptually indistinguishable by a human eye. Such textures are costly to encode using image and video codecs. For example, in the state-of-the-art compression standard High Efficiency Video Coding (HEVC), detailed textures typically show relatively strong blurring artifacts at low rates (high QPs). Texture synthesis is a process whereby one can obtain a reconstruction of a visually equivalent texture with decent visual quality, given a set of parameters. In this paper, texture synthesis is used as a tool in combination with HEVC, exploiting Human Visual Perception (HVP) properties by creating an artificial textured content using model parameters at the decoder side. A novel scheme for compression (prediction and quantization) of parameters for complex wavelet based texture synthesis is introduced. The compressed parameters are sufficient to synthesize high quality texture content at the decoder side. Simulation results have shown, that with same rates, both the subjective and the objective quality is enhanced, compared to HEVC.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133435424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信