{"title":"Blind low-complexity estimation of reverberation time","authors":"Christian Schüldt, P. Händel","doi":"10.1109/WASPAA.2013.6701875","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701875","url":null,"abstract":"Real-time blind reverberation time estimation is of interest in speech enhancement techniques such as e.g. dereverberation and microphone beamforming. Advances in this field have been made where the diffusive reverberation tail is modeled and the decay rate is estimated using a maximum-likelihood approach. Various methods for reducing the computational complexity have also been presented. This paper proposes a method for even further computational complexity reduction, by more than 60% in some cases, and it is shown through simulations that the results of the proposed method are very similar to that of the original.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"74 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114036140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The influence of informational masking in complex real-world environments","authors":"Adam Westermann, J. Buchholz","doi":"10.1109/WASPAA.2013.6701873","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701873","url":null,"abstract":"Spatial release from masking (SRM) is believed to be an essential auditory mechanism aiding listeners in reverberant multi-talker environments. However, SRM is often measured in simplified spatial configurations using speech corpora with exaggerated talker and/or context confusions. Besides energetic better-ear listening and binaural unmasking, the perceived spatial separation of target and masking speech signals is thought to aid listener's segregation of speech signals, resulting in a so-called release from informational masking. This study aims to estimate the amount of informational masking that is apparent in complex real-world environments. Speech reception thresholds (SRTs) were measured by presenting Bamford-Kowal-Bench (BKB) sentences in a simulated cafeteria environment recreated by a spherical array of 41 loudspeakers placed in an anechoic chamber. Three maskers with varying degree of informational masking were realized: one with talkers different from the target, one with an unintelligible noise vocoder (minimal informational masking) and one with the same talker as the target (maximum informational masking). The maskers were constructed with either two or seven two-talker conversations and were either spatially distributed in the simulated cafeteria or colocated with the target. Seven normal hearing listeners were tested. All conditions showed improved thresholds for the spatialized condition compared to the colocated condition. However there was no significant difference between the different talker speech and vocoded masker. Only the same talker masker showed increased thresholds and this was only substantial in the two conversation colocated condition. These results suggest that informational masking is of low relevance in real-life listening and is exaggerated in listening tests by target/masker similarities and the colocated spatial configuration. However, this may be different in (aided) hearing impaired listeners where spectral and spatial cues can be significantly disturbed.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114282590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low-artifact source separation using probabilistic latent component analysis","authors":"N. Mohammadiha, P. Smaragdis, A. Leijon","doi":"10.1109/WASPAA.2013.6701837","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701837","url":null,"abstract":"We propose a method based on the probabilistic latent component analysis (PLCA) in which we use exponential distributions as priors to decrease the activity level of a given basis vector. A straightforward application of this method is when we try to extract a desired source from a mixture with low artifacts. For this purpose, we propose a maximum a posteriori (MAP) approach to identify the common basis vectors between two sources. A low-artifact estimate can now be obtained by using a constraint such that the common basis vectors in the interfering signal's dictionary tend to remain inactive. We discuss applications of this method in source separation with similar-gender speakers and in enhancing a speech signal that is contaminated with babble noise. Our simulations show that the proposed method not only reduces the artifacts but also increases the overall quality of the estimated signal.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132757222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spotforming using distributed microphone arrays","authors":"Maja Taseska, Emanuël Habets","doi":"10.1109/WASPAA.2013.6701876","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701876","url":null,"abstract":"Extracting sounds that originate from a specific location, while reducing noise and interferers is required in many hands-free communications systems. We propose a spotforming approach that uses distributed microphone arrays and aims at extracting sounds that originate from a pre-defined spot of interest (SOI), while reducing background noise and sounds that originate from outside the SOI. The spotformer is realized as a linear spatial filter, which is based on the signal statistics of sounds from the SOI, the signal statistics of sounds outside the SOI and the background noise signal statistics. The required signal statistics are estimated from the microphone signals, while taking into account the uncertainty in the location estimates of the desired and the interfering sound sources. The applicability of the method is demonstrated by simulations and the quality of the extracted signal is evaluated in different scenarios.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"114 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113993936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marcelo F. Caetano, George P. Kafentzis, G. Degottex, A. Mouchtaris, Y. Stylianou
{"title":"Evaluating how well filtered white noise models the residual from sinusoidal modeling of musical instrument sounds","authors":"Marcelo F. Caetano, George P. Kafentzis, G. Degottex, A. Mouchtaris, Y. Stylianou","doi":"10.1109/WASPAA.2013.6701840","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701840","url":null,"abstract":"Nowadays, sinusoidal modeling commonly includes a residual obtained by the subtraction of the sinusoidal model from the original sound. This residual signal is often further modeled as filtered white noise. In this work, we evaluate how well filtered white noise models the residual from sinusoidal modeling of musical instrument sounds for several sinusoidal algorithms. We compare how well each sinusoidal model captures the oscillatory behavior of the partials by looking into how “noisy” their residuals are. We performed a listening test to evaluate the perceptual similarity between the original residual and the modeled counterpart. Then we further investigate whether the result of the listening test can be explained by the fine structure of the residual magnitude spectrum. The results presented here have the potential to subsidize improvements on residual modeling.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116128152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A recursive generalized sidelobe canceler for multichannel blind speech dereverberation","authors":"S. Malik, J. Benesty, Jingdong Chen","doi":"10.1109/WASPAA.2013.6701814","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701814","url":null,"abstract":"In this paper, we propose a generalized sidelobe canceler for multichannel blind speech dereverberation, which relies on recursive estimation of posterior distributions on the unknown acoustic channels and the adaptive interference canceler (AIC). Contrary to conventional design approaches where a fixed beamformer is employed, we consider a marginalized maximum-likelihood equalizer that is driven by the channel posterior estimator. It is shown that the first moment of the inferred channel posterior can also serve as a representation of an adaptive blocking matrix (ABM). Using the output of the blocking matrix, we estimate the AIC posterior to minimize the residual reverberation in the equalized signal. We demonstrate the efficacy of our approach by evaluating the algorithm in different degrees of observation noise and varying reverberation times.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130134300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recurrence quantification analysis features for environmental sound recognition","authors":"Gerard Roma, Waldo Nogueira, P. Herrera","doi":"10.1109/WASPAA.2013.6701890","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701890","url":null,"abstract":"This paper tackles the problem of feature aggregation for recognition of auditory scenes in unlabeled audio. We describe a new set of descriptors based on Recurrence Quantification Analysis (RQA), which can be extracted from the similarity matrix of a time series of audio descriptors. We analyze their usefulness for environmental audio recognition combined with traditional feature statistics in the context of the AASP D-CASE[1] challenge. Our results show the potential of non-linear time series analysis techniques for dealing with environmental sounds.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"28 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124549716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gentle acoustic crosstalk cancelation using the spectral division method and Ambiophonics","authors":"J. Ahrens, Mark R. P. Thomas, I. Tashev","doi":"10.1109/WASPAA.2013.6701827","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701827","url":null,"abstract":"We propose the concept of gentle acoustic crosstalk cancelation, which aims at reducing the crosstalk between a loudspeaker and the listener's contralateral ear instead of eliminating it completely as aggressive methods intend to do. The expected benefit is higher robustness and a tendency to collapse less unpleasantly. The proposed method employs a linear loudspeaker array and exhibits two stages: 1) Use the Spectral Division Method to illuminate the ipsilateral ear using constructive interference of the loudspeaker signals. This approach provides only little channel separation between the listener's ears at frequencies below approximately 2000 Hz. 2) There we additionally use destructive interference by Recursive Ambiophonics Crosstalk Elimination (RACE). RACE was chosen because of its tendency to collapse gently. In a sample scenario with realistic parameters, the proposed method achieves around 20 dB of channel separation between 700 Hz and 9000 Hz, which appears to be sufficient to achieve full perceived lateralization when only one ear is illuminated.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"222 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124405847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Music self-similarity modeling using augmented nonnegative matrix factorization of block and stripe patterns","authors":"J. Kauppinen, Anssi Klapuri, T. Virtanen","doi":"10.1109/WASPAA.2013.6701855","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701855","url":null,"abstract":"Self-similarity matrices have been widely used to analyze the sectional form of music signals, e.g. enabling the detection of parts such as verse and chorus in popular music. Two main types of structures often appear in self-similarity matrices: rectangular blocks of high similarity and diagonal stripes off the main diagonal that represent recurrent sequences. In this paper, we introduce a novel method to model both the block and stripe-like structures in self-similarity matrices and to pull them apart from each other. The model is an extension of the nonnegative matrix factorization, for which we present multiplicative update rules based on the generalized Kullback-Leibler divergence. The modeling power of the proposed method is illustrated with examples, and we demonstrate its application to the detection of sectional boundaries in music.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130343565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. B. Haddad, Leonardo O. Nunes, W. Martins, L. Biscainho, Bowon Lee
{"title":"Closed-form solutions for robust acoustic sensor localization","authors":"D. B. Haddad, Leonardo O. Nunes, W. Martins, L. Biscainho, Bowon Lee","doi":"10.1109/WASPAA.2013.6701810","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701810","url":null,"abstract":"This paper deals with the localization of acoustic sensors based on signals emitted by loudspeakers at known positions. In particular, a model for distortions in time-of-flight (TOF) estimates applicable to the sensor localization problem is presented along with closed-form solutions with low computational cost. The proposed techniques are able to approximate the sensor position even when the TOFs are corrupted by an unknown delay, there is a sampling frequency mismatch between the A/D and D/A converters associated with sensor and loudspeakers, and the speed of sound is unknown. Simulations and an experiment on real data demonstrate that the proposed methods are able to estimate sensor positions with less than 2 cm of error in the evaluated scenarios.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"202 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133376677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}