{"title":"Rate-distortion optimization for multichannel audio compression","authors":"Minyue Li, J. Skoglund, W. Kleijn","doi":"10.1109/WASPAA.2013.6701839","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701839","url":null,"abstract":"Multichannel audio coding is studied from a rate-distortion theoretical viewpoint. Two practical coding techniques, both of which are based on rate-distortion optimization, are also proposed. The first technique decorrelates a multichannel signal hierarchically using elementary unitary transforms. The second method rearranges a multichannel signal into sub-signals and compresses them at optimized bit rates using a conventional codec. Both objective and subjective tests were conducted to illustrate the efficiency of the methods.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133465472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eleonora Cagli, Diego Carrera, G. Aletti, G. Naldi, B. Rossi
{"title":"Robust DOA estimation of speech signals via sparsity models using microphone arrays","authors":"Eleonora Cagli, Diego Carrera, G. Aletti, G. Naldi, B. Rossi","doi":"10.1109/WASPAA.2013.6701823","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701823","url":null,"abstract":"Direction-of-arrival (DOA) estimation of speech signals using a set of spatially separated microphones in an array is a problem arising in many practical applications. Examples include human computer interfaces, automatic camera-steering systems for multipartecipant videoconferencing, and tracking systems in smart home environments. This paper introduces a robust method for speech signals localization which makes use of sparsity models for signal representation, and includes an analysis of the denoising problem for realistic applications using MEMS microphone arrays. Experimental results on both synthetic and real speech data show that the proposed method is noise-robust and provides high reliable localization performances even in case of multiple sources and small number of microphones.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115845915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Perceptual Cepstral filters for speech and music processing","authors":"R. Mignot, V. Välimäki","doi":"10.1109/WASPAA.2013.6701858","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701858","url":null,"abstract":"Source-filter modeling of speech or musical tones requires a filter model for the spectral envelope of the signal. To reduce the number of modeling parameters, one idea is the use of psychoacoustic knowledge to encode only the relevant information in a perceptual sense. Starting from an accurate estimation of the original spectral envelope, with imperceptible details, in this work, we propose to use its Mel-Frequency Cepstral Coefficient (MFCC) representation to catch the perceptually relevant information. Then, a new inverse process is presented to derive a smoother, but perceptually equivalent spectral envelope. For instance, this new method can be applied in speech coding, and thanks to the good properties of the MFCC representation, perceptual interpolations of sounds is made easier.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126990325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MINTFormer: A spatially aware channel equalizer","authors":"Felicia Lim, Mark R. P. Thomas, P. Naylor","doi":"10.1109/WASPAA.2013.6701881","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701881","url":null,"abstract":"Reverberation is a process that distorts a wanted signal and impairs perceived speech quality. In the context of multichannel dereverberation, channel-based methods and beamforming are two common approaches. Channel-based methods such as the multiple input/output inverse theorem (MINT) can provide perfect dereverberation provided the exact acoustic impulse responses (AIRs) are known. However, they have been shown to be very sensitive to AIR estimation errors for which several modifications have consequently been proposed. Conversely, beamformers are significantly more robust but provide comparatively modest dereverberation. While the two approaches are conventionally considered independent, both can be formulated as a filter-and-sum operation with differing filter design criteria. We propose a unified framework, termed MINT-Forming, that exploits this similarity and introduces a mixing parameter to control the tradeoff between the potential performance of MINT and the robustness of beamforming. Empirical results show that the mixing parameter is a monotonic function of channel estimation error, whereby a MINT solution is preferred when channel estimation error is low.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124812059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Frequency domain multi-channel expectation maximization algorithm for audio background noise reduction","authors":"Jichi Deng, S. Godsill","doi":"10.1109/WASPAA.2013.6701859","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701859","url":null,"abstract":"In this paper we implement expectation maximization (EM) based methods in the short time Fourier transform (STFT) domain for background noise reduction in multi-channel systems. The models introduce a Wishart prior for the unknown signal covariance matrix. An EM algorithm is used to maximise the posterior probability for the clean signal, approaching a stationary point of the distribution with increasing iterations. The background noise is modelled as white and stationary in this initial work. The proposed methods are found to outperform a multi-channel Wiener filter in terms of residual noise artefacts and MSE for a small initial trial.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121679005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gaussian process data fusion for heterogeneous HRTF datasets","authors":"Yuancheng Luo, D. Zotkin, R. Duraiswami","doi":"10.1109/WASPAA.2013.6701842","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701842","url":null,"abstract":"Head-Related Transfer Function (HRTF) measurement and extraction are important tasks for personalized-spatial audio. Many laboratories have their own apparatuses for data-collection but few studies have compared their results to a common subject or have modeled inter-dataset variances. We present a Bayesian fusion method based on Gaussian process (GP) modeling of joint spatial-frequency HRTFs over different spherical-measurement grids. Neumann KU-100 dummy HRTFs from 7 labs in the “Club Fritz” study are compared and fused to each other based on learning a set of transformations from the GP data-likelihood and covariance assumptions; parameter and hyperparameter training is automatic. Experimental results show that fused models for horizontal and median-plane HRTFs generalize the datasets better than pre-transformed ones.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128360816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new clustering approach for solving the permutation problem in convolutive blind source separation","authors":"Radoslaw Mazur, J. Jungmann, A. Mertins","doi":"10.1109/WASPAA.2013.6701852","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701852","url":null,"abstract":"In this paper we propose a new clustering approach for solving the permutation ambiguity in convolutive blind source separation. After the transformation to the time-frequency domain, the problem of separation of sources can be reduced to multiple instantaneous problems, which may be solved using independent component analysis. The drawbacks of this approach are the inherent permutation and scaling ambiguities, which have to be corrected before the transformation to the time domain. Here, we propose a new method that allows for aligning up to several hundreds of consecutive bins into clusters. The depermutation of these clusters using some known techniques is then much easier than the original problem. The performance of the proposed method is evaluated on real-room recordings.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133960881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Room impulse response synthesis based on a 2D multi-plane FDTD hybrid acoustic model","authors":"Stephen Oxnard, D. Murphy","doi":"10.1109/WASPAA.2013.6701887","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701887","url":null,"abstract":"This paper exposes, and analyzes the validity of, a novel hybrid acoustic modeling system created through complementary assimilation of 3D geometric and 2D numerical modeling techniques. It is demonstrated that multiple 2D Finite Difference Time Domain schemes may be employed to simulate low-frequency sound wave propagation throughout a simplistic 3D enclosure, thus avoiding the immense computational challenges posed by 3D numerical approaches. Band limited room impulse responses (RIRs) generated in this way may be appropriately calibrated and combined with high-frequency results obtained from well-established geometric modeling methods to realize efficient, yet accurate hybrid RIR synthesis. Objective results show that the low-frequency 2D multiplane solution yields comparable accuracy to that gained through 3D simulation while achieving a run-time reduction of 99.15%.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133411703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Wave-domain echo-path model with aliasing for echo cancellation","authors":"S. Emura, Y. Hiwasaki, H. Ohmuro","doi":"10.1109/WASPAA.2013.6701844","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701844","url":null,"abstract":"Wave-domain adaptive filtering for echo cancellation has been proposed for achieving immersive full-duplex sound conferencing that uses wave field reconstruction as spatial sound rendering. In wave-domain adaptive filtering, fundamental solutions of the wave equation are spatially sampled and used as the orthogonal basis functions. This sampling is determined by loudspeaker spacing and results in aliasing; aliasing occurs above a few thousand Hz for spacing of several centimeters. The goal of this work is to investigate the effect of applying adaptive filtering on echo signal with aliasing when the loudspeaker array and microphone array are uniform linear arrays of identical geometries. We came to the conclusion that we can apply the wave-domain echo-path model, used below spatial Nyquist frequency, to wave-domain adaptive filtering over this frequency even in the presence of aliasing components.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114962131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advanced speech-audio processing in mobile phones and hearing aids: Synergies and distinctions","authors":"P. Vary","doi":"10.1109/WASPAA.2013.6701899","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701899","url":null,"abstract":"Summary form only given. Mobile phones and modern hearing aids comprise advanced digital signal processing techniques as well as coding algorithms. From a functional point of view, digital hearing devices and mobile phones are approaching each other. In both types of devices similar or partly even identical algorithms can be found such as echo, reverberation and feedback control, noise reduction, intelligibility enhancement, artificial bandwidth extension, and binaural processing with two or more microphones. Actual hearing aids include digital audio receivers and transmitters not only for communication and entertainment but also for binaural directional processing. State-of-the-art mobile phones offer new speech-audio compression schemes for the emerging HD-telephone services and they are equipped with two (or more) microphones for the purpose of speech enhancement. Thus, it is not a too big step to realize hearing aid features as apps on smart phones. The further evolution might lead us to binaural mobile telephony, providing ambient and spatial information - a preferred solution for audio conferencing, for example. Despite these relations, the signal conditions and the processing constraints are quite different, e.g., with respect to coherence of signals, complexity of algorithms, coding-noise shaping for binaural processing, power consumption, and latency. Synergies and distinctions of the corresponding signal processing and coding algorithms will be discussed. Design constraints and solutions will be presented by examples.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131832327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}