{"title":"Spatial Sampling and Beamforming for Spherical Microphone Arrays","authors":"B. Rafaely","doi":"10.1109/HSCMA.2008.4538673","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538673","url":null,"abstract":"Spherical microphone arrays have been recently studied for spatial sound recording, speech communication, and sound field analysis for room acoustics and noise control. Complementary theoretical studies presented progress in spatial sampling and beamforming methods. This paper reviews recent results in spatial sampling that facilitate a wide range of spherical array configurations, from a single rigid sphere to free positioning of microphones. The paper then presents an overview of beamforming methods recently presented for spherical arrays, from the widely used delay-and-sum and Dolph-Chebyshev, to the more advanced optimal methods, typically performed in the spherical harmonics domain.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129004581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Auto-Focusing Wideband Bayesian Beamforming","authors":"Tao Yu, J. Hansen","doi":"10.1109/HSCMA.2008.4538688","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538688","url":null,"abstract":"The problem of uncertain direction-of-arrival (DOA) for narrowband sources has been addressed using adaptive Bayesian beamforming[2,3]. In this study, we present a wideband Bayesian beamforming technique based on the coherent signal-subpace transform (CSST). CSST focuses the wideband data onto a single narrowband to allow for a narrowband Bayesian beamformer, which in turn provides the data-driven DOA information needed to update the key part of CSST-the focusing matrix. Numerical simulations with array data show that the proposed beamformer is robust to both DOA mismatch and coherent wideband interfences.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115258017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distant Speech Recognition: Bridging the Gaps","authors":"John McDonough, Matthias Wölfel","doi":"10.1109/HSCMA.2008.4538699","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538699","url":null,"abstract":"While great progress has been made in both fields, there is currently a relatively large rift between researchers engaged in acoustic array processing and those engaged in automatic speech recognition. This is unfortunate for many reasons, but most of all because it prevents the two sides, both of whom are investigating different aspects of the same problem, from truly understanding one another and cooperating. In many cases, the two sides see each other through the eyes of strangers. If ground breaking progress is to be made in the emerging field of distant speech recognition (DSR), this abysmal state of affairs must change. In this work, we outline five pressing problems in the DSR research field, and we make initial proposals for their solutions. The problems discussed here are by no means the only ones that must be solved in order to construct truly effective DSR systems. Nonetheless, their solution, in our view, will represent significant first steps towards this goal, inasmuch as the solution of each of these problems will require a substantial change in the mind-sets and thought patterns of those engaged in this field of research.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125895974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancement of Sounds in a Specific Directional Area Using Power Spectra Estimated from Multiple Beamforming Outputs","authors":"Y. Hioka, K. Kobayashi, K. Furuya, A. Kataoka","doi":"10.1109/HSCMA.2008.4538685","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538685","url":null,"abstract":"In this paper, a method for picking up sounds located in a particular range of angles is proposed. The structure of the method is based on beamforming with postfiltering. The main part our proposal is introducing a scheme to estimate the power spectra of both desired signals and noises, which are used to derive the Wiener postfilter. From the results of some experiments in a reverberant chamber, we have confirmed that the proposed method succeeded in suppressing more than 10dB of the noise level even in a practical situation.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131613938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Stern, E. Gouvêa, Chanwoo Kim, K. Kumar, Hyung-Min Park
{"title":"Binaural and Multiple-Microphone Signal Processing Motivated by Auditory Perception","authors":"R. Stern, E. Gouvêa, Chanwoo Kim, K. Kumar, Hyung-Min Park","doi":"10.1109/HSCMA.2008.4538697","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538697","url":null,"abstract":"It is well known that binaural processing is very useful for separating incoming sound sources as well as for improving the intelligibility of speech in reverberant environments. This paper describes and compares a number of ways in which the classic model of interaural cross-correlation proposed by Jeffress, quantified by Colburn, and further elaborated by Blauert, Lindemann, and others, can be applied to improving the accuracy of automatic speech recognition systems operating in cluttered, noisy, and reverberant environments. Typical implementations begin with an abstraction of cross-correlation of the incoming signals after nonlinear monaural bandpass processing, but there are many alternative implementation choices that can be considered. Typical implementations differ in the ways in which an enhanced version of the desired signal is developed using binaural principles, in the extent to which specific processing mechanisms are used to impose suppression motivated by the precedence effect, and in the precise mechanism used to extract interaural time differences.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"456 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115100423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Study of Speech Intelligibility in Noisy Enclosures Using Spherical Microphones Arrays","authors":"Yotam Peled, B. Rafaely","doi":"10.1109/HSCMA.2008.4538711","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538711","url":null,"abstract":"Detection of clear speech in highly reverberant and noisy enclosures is an extremely difficult problem. Recently, spherical microphone arrays have been studied that are suitable for noise reduction and de-reverberation in three dimensions. This paper presents the development of a model for investigating speech intelligibility in noisy enclosures when recorded and processed by spherical microphones arrays. The model uses the image method, diffuse sound fields, spherical array beamforming and speech intelligibility measures, to predict the array order required to overcome noise and reverberation when detecting speech in noisy enclosures. Having such a model, one can design a spherical array that overcomes given acoustic conditions, or assess whether a given problem can be solved by a practical array configuration.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"1977 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132703336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Potamianos, Jing Huang, E. Marcheret, V. Libal, R. Balchandran, M. Epstein, L. Serédi, M. Labský, L. Ures, M. Black, P. Lucey
{"title":"Far-Field Multimodal Speech Processing and Conversational Interaction in Smart Spaces","authors":"G. Potamianos, Jing Huang, E. Marcheret, V. Libal, R. Balchandran, M. Epstein, L. Serédi, M. Labský, L. Ures, M. Black, P. Lucey","doi":"10.1109/HSCMA.2008.4538701","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538701","url":null,"abstract":"Robust speech processing constitutes a crucial component in the development of usable and natural conversational interfaces. In this paper we are particularly interested in human-computer interaction taking place in \"smart\" spaces - equipped with a number of far- field, unobtrusive microphones and camera sensors. Their availability allows multi-sensory and multi-modal processing, thus improving robustness of speech-based perception technologies in a number of scenarios of interest, for example lectures and meetings held inside smart conference rooms, or interaction with domotic devices in smart homes. In this paper, we overview recent work at IBM Research in developing state-of-the-art speech technology in smart spaces. In particular we discuss acoustic scene analysis, speech activity detection, speaker diarization, and speech recognition, emphasizing multi-sensory or multi-modal processing. The resulting technology is envisaged to allow far-field conversational interaction in smart spaces based on dialog management and natural language understanding of user requests.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116570180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Room Acoustics Parameters Affecting Speaker Recognition Degradation Under Reverberation","authors":"I. Peer, B. Rafaely, Y. Zigel","doi":"10.1109/HSCMA.2008.4538705","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538705","url":null,"abstract":"The performance of speaker recognition systems may degrade significantly when speech is recorded in reverberant environments by a microphone positioned far from the speaker. Most of the literature on speaker recognition uses the reverberation time to classify the reverberation effects. However, as described in this work, the reverberation time is mainly a room feature and is less affected by the distance between the source and the microphone. This paper presents a comprehensive study of room acoustics parameters and their relationship with speaker recognition performance. The definition and centra-time, acoustic parameters which are affected by both room properties and distance, were found to be more correlated with the degradation in the speaker recognition performance.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"872 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131820361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Time-Domain Blind Audio Source Separation Using Advanced Component Clustering and Reconstruction","authors":"Zbyněk Koldovský, P. Tichavský","doi":"10.1109/HSCMA.2008.4538725","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538725","url":null,"abstract":"We present a novel time-domain method for blind separation of convolutive mixture of audio sources (the cocktail party problem). The method allows efficient separation with good signal-to-interference ratio (SIR) and signal-to-distortion ratio (SDR) using short data segments only. In practice, we are able to separate 2-4 speakers from audio recording of the length less than 6000 samples, which is less than 1 s in the 8 kHz sampling. The average time needed to process the data with filter of the length 20 was 2.2 seconds in Matlab v. 7.2 on an ordinary PC with 3 GHz processor.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133627948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatial Maximum a Posteriori Post-Filtering for Arbitrary Beamforming","authors":"T. Wolff, M. Buck","doi":"10.1109/HSCMA.2008.4538686","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538686","url":null,"abstract":"We present a new approach for residual transient noise suppression at the output of an arbitrary beamformer. A spatial optimum estimate for the instantaneous a posteriori SNR is derived on the basis of the output signals of a blocking matrix. The optimization problem is formulated in the logarithmic domain and statistical models for the obtained quantities are given. Based on these models the optimization problem is solved in the maximum a posteriori sense. It is shown that the performance of speech recognition systems in non-stationary noise scenarios is improved considerably compared to the performance achieved with a Wiener filter applied to the beamformer output.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133961994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}