Rafael Zambrano-Lopez, T. Prego, A. Lima, S. L. Netto
{"title":"On the enhancement of dereverberation algorithms using multiple perceptual-evaluation criteria","authors":"Rafael Zambrano-Lopez, T. Prego, A. Lima, S. L. Netto","doi":"10.1109/MMSP.2016.7813373","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813373","url":null,"abstract":"This paper describes an enhancement strategy based on several perceptual-assessment criteria for dereverberation algorithms. The complete procedure is applied to an algorithm for reverberant speech enhancement based on single-channel blind spectral subtraction. This enhancement was implemented by combining different quality measures, namely the so-called QAreverb, the speech-to-reverberation modulation energy ratio (SRMR) and the perceptual evaluation of speech quality (PESQ). Experimental results, using a 4211-signal speech database, indicate that the proposed modifications can improve the word error rate (WER) of speech recognition systems an average of 20%.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"519 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123352146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianchao Zhou, Liqun Peng, Xiaoou Chen, Deshun Yang
{"title":"Robust sound event classification by using denoising autoencoder","authors":"Jianchao Zhou, Liqun Peng, Xiaoou Chen, Deshun Yang","doi":"10.1109/MMSP.2016.7813376","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813376","url":null,"abstract":"Over the last decade, a lot of research has been done on sound event classification. But a main problem with sound event classification is that the performance sharply degrades in the presence of noise. As spectrogram-based image features and denoising auto encoder reportedly have superior performance in noisy conditions, this paper proposes a new robust feature called denoising auto encoder image feature (DIF) for sound event classification which is an image feature extracted from an image-like representation produced by denoising auto encoder. Performance of the feature is evaluated by a classification experiment using a SVM classifier on audio examples with different noise levels, and compared with that of baseline features including mel-frequency cepstral coefficients (MFCC) and spectrogram image feature. The proposed DIF demonstrates better performance under noise-corrupted conditions.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129076116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Perceptual video quality assessment: Spatiotemporal pooling strategies for different distortions and visual maps","authors":"Mohammed A. Aabed, G. Al-Regib","doi":"10.1109/MMSP.2016.7813336","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813336","url":null,"abstract":"In this paper, we investigate the challenge of distortion map feature selection and spatiotemporal pooling in perceptual video quality assessment (PVQA). We analyze three distortion maps representing different visual features spatially and temporally: squared error, local pixel-level SSIM, and absolute difference of optical flow magnitudes. We examine the performance of each of these maps with different spatial and temporal pooling strategies across three databases. We identify the most effective statistical pooling strategies spatially and temporally with respect to PVQA. We also show the most significant spatial and temporal features correlated with perception for every distortion/feature map. Our results show that varying the pooling strategy and distortion maps yields a significant improvement in perceptual quality estimation. We also deduce insights from our results to better understand the sensitivity of human vision to distortions. We aim for these findings to provide perceptual cues and guidelines to researchers during metric design, perceptual feature selection, HVS modeling and pooling selection/optimization. We further show that the same distortions across databases can yield different results in terms of PVQA evaluation and verification.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130087249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Face video based touchless blood pressure and heart rate estimation","authors":"Monika Jain, Sujay Deb, A. Subramanyam","doi":"10.1109/MMSP.2016.7813389","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813389","url":null,"abstract":"Hypertension (high blood pressure) is the leading cause for increasing number of premature deaths due to cardiovascular diseases. Continuous hypertension screening seems to be a promising approach in order to take appropriate steps to alleviate hypertension-related diseases. Many studies have shown that physiological signal like Photoplethysmogram (PPG) can be reliably used for predicting the Blood Pressure (BP) and Heart Rate (HR). However, the existing approaches use a transmission or reflective type wearable sensor to collect the PPG signal. These sensors are bulky and mostly require an assistance of a trained medical practitioner; which preclude these approaches from continuous BP monitoring outside the medical centers. In this paper, we propose a novel touchless approach that predicts BP and HR using the face video based PPG. Since the facial video can easily be captured using a consumer grade camera, this approach is a convenient way for continuous hypertension monitoring outside the medical centers. The approach is validated using the face video data collected in our lab, with the ground truth BP and HR measured using a clinically approved BP monitor OMRON HBP1300. Accuracy of the method is measured in terms of normalized mean square error, mean absolute error and error standard deviation; which complies with the standards mentioned by Association for the Advancement of Medical Instrumentation. Two-tailed dependent sample t-test is also conducted to verify that there is no statistically significant difference between the BP and HR predicted using the proposed approach and the BP and HR measured using OMRON.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116298706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ines Saidi, Lu Zhang, Vincent Barriac, O. Déforges
{"title":"Audiovisual quality study for videoconferencing on IP networks","authors":"Ines Saidi, Lu Zhang, Vincent Barriac, O. Déforges","doi":"10.1109/MMSP.2016.7813379","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813379","url":null,"abstract":"In this paper, an audiovisual quality assessment experiment was conducted on audiovisual clips collected using a PC-based videoconferencing application connected via a local IP network. The analyses of experimental results provided a better understanding of the influence of network impairments (packet loss, jitter, delay) on perceived audio and video qualities, as well as their interaction effect on the overall audiovisual quality in videoconferencing applications. We updated the human perception acceptability limits of audio-video synchronization for video conferencing. Further, we investigated the contribution of this synchronization to the audiovisual quality independently and accompanied with network impairments. Finally, we proposed an integration model to estimate the audiovisual quality in the studied context.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126012689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A study of the perceptual relevance of the burst phase of stop consonants with implications in speech coding","authors":"Vincent Santini, P. Gournay, R. Lefebvre","doi":"10.1109/MMSP.2016.7813374","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813374","url":null,"abstract":"Stop consonants are an important constituent of the speech signal. They contribute significantly to its intelligibility and subjective quality. However, because of their dynamic and unpredictable nature, they tend to be difficult to encode using conventional approaches such as linear predictive coding and transform coding. This paper presents a system to detect, segment, and modify stop consonants in a speech signal. This system is then used to assess the following hypothesis: Muting the burst phase of stop consonants has a negligible impact on the subjective quality of speech. The muting operation is implemented and its impact on subjective quality is evaluated on a database of speech signals. The results show that this apparently drastic alteration has in reality very little perceptual impact. The implications for speech coding are then discussed.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"PP 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126709610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A drift compensated reversible watermarking scheme for H.265/HEVC","authors":"S. Gaj, Shuvendu Rana, A. Sur, P. Bora","doi":"10.1109/MMSP.2016.7813358","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813358","url":null,"abstract":"In this paper, a compressed domain drift compensated reversible watermarking scheme is proposed with a high embedding capacity and the least amount of visual quality degradation for H.265/HEVC videos. Using compressed domain syntax elements, such as motion vector and transformed residual, a set of 4 × 4 Transform Blocks (TB) of similar texture are chosen from consecutive I Frames for watermark embedding. Due to texture similarity of these selected TBs, the differences between the transformed coefficients are equal or close to zero. Utilizing this difference statistics, a multilevel watermarking is inserted in the compressed video by altering near zeros values in the difference transformed coefficients. A comprehensive set of experiments have been carried out to justify the efficacy of the proposed scheme over existing literature.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126825295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast mode decision for HEVC intra coding with efficient mode skipping and improved RMD","authors":"Xin Lu, Nan Xiao, Yue Hu, Zhilu Wu, G. Martin","doi":"10.1109/MMSP.2016.7813355","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813355","url":null,"abstract":"HEVC employs a quad-tree based Coding Unit (CU) structure to achieve a significant improvement in coding efficiency compared with previous standards. However, the computational complexity is greatly increased. We proposed a fast mode decision algorithm to reduce intra coding complexity. Firstly, an initial candidate list of intra modes is constructed for each Prediction Unit (PU). The prediction mode correlation between adjacent quad-tree coding levels and between temporal neighbouring frames is used to predict the most likely coding mode. The number of prediction mode that need to be evaluated in residual quad-tree (RQT) process is further reduced by taking the Hadamard cost of prediction mode into consideration. Simulation results show that the proposed algorithm saves encoding time by up to 51% compared with the HM 13.0 implementation, while having a negligible impact on rate distortion.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130458022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sai Prithvi Gadde, Sam Tabaja, Philip Olivier, N. Jaber, Mahdi Ali, R. Chabaan, Scott Bone
{"title":"A quantitative real time data analysis in vehicular speech environment with varying SNR","authors":"Sai Prithvi Gadde, Sam Tabaja, Philip Olivier, N. Jaber, Mahdi Ali, R. Chabaan, Scott Bone","doi":"10.1109/MMSP.2016.7813375","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813375","url":null,"abstract":"The purpose of this paper is to compare the performance of two common filters operating on noisy speech recorded in automobiles travelling at various speeds. The filters are based on Spectral Subtraction (SS) and Kalman Filtering (KF). The literature contains studies based on simulated data whereas this paper uses real time data collected in car's in search of an optimal solution. The comparisons were based on real recorded samples containing noisy speech signals with durations of approximately 2 minutes each. Different cases of noise levels which represent the most common situations experienced by drivers were created. The different settings used include varying car speeds (e.g., 40 mph, 70 mph), varying fan power, and window positions settings. The study was carried out using three different car models. The measured noisy voice signals were filtered using the different filtering techniques and the resulting filtered signals were compared in the time domain and the frequency domain, both quantitatively and psychometrically. Furthermore, the quantitative analysis approach was applied to the results for more accurate interpretation. Results show that SS outperforms KF in noise reduction, and with much less speech distortion at the different Signal to Noise Ratios (SNRs) tested. The audio test results subjected to human listening are comparable with the simulation results. Overall, SS showed superior performance over KF in vehicular hands-free speech applications.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131670397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image coding using parametric texture synthesis","authors":"Uday Singh Thakur, Bappaditya Ray","doi":"10.1109/MMSP.2016.7813339","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813339","url":null,"abstract":"Visual textures like grass, water etc. consist of dense and random variations in contrast that are perceptually indistinguishable by a human eye. Such textures are costly to encode using image and video codecs. For example, in the state-of-the-art compression standard High Efficiency Video Coding (HEVC), detailed textures typically show relatively strong blurring artifacts at low rates (high QPs). Texture synthesis is a process whereby one can obtain a reconstruction of a visually equivalent texture with decent visual quality, given a set of parameters. In this paper, texture synthesis is used as a tool in combination with HEVC, exploiting Human Visual Perception (HVP) properties by creating an artificial textured content using model parameters at the decoder side. A novel scheme for compression (prediction and quantization) of parameters for complex wavelet based texture synthesis is introduced. The compressed parameters are sufficient to synthesize high quality texture content at the decoder side. Simulation results have shown, that with same rates, both the subjective and the objective quality is enhanced, compared to HEVC.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133435424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}