2008 Hands-Free Speech Communication and Microphone Arrays最新文献

筛选
英文 中文
Perceptually-Motivated Nonlinear Channel Decorrelation for Stereo Acoustic Echo Cancellation 立体声学回声消除的感知激励非线性信道去相关
2008 Hands-Free Speech Communication and Microphone Arrays Pub Date : 2008-05-06 DOI: 10.1109/HSCMA.2008.4538718
J. Valin
{"title":"Perceptually-Motivated Nonlinear Channel Decorrelation for Stereo Acoustic Echo Cancellation","authors":"J. Valin","doi":"10.1109/HSCMA.2008.4538718","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538718","url":null,"abstract":"Acoustic echo cancellation with stereo signals is generally an under-determined problem because of the high coherence between the left and right channels. In this paper, we present a novel method of significantly reducing inter-channel coherence without affecting the audio quality. Our work takes into account psychoacoustic masking and binaural auditory cues. The proposed non-linear processing combines a shaped comb-allpass (SCAL) filter with the injection of psychoacoustically masked noise. We show that the proposed method performs significantly better than other known methods for reducing inter-channel coherence.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"323 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132454527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
HMM-Based Mask Estimation for a Speech Recognition Front-End Using Computational Auditory Scene Analysis 基于hmm的基于计算听觉场景分析的语音识别前端掩码估计
2008 Hands-Free Speech Communication and Microphone Arrays Pub Date : 2008-05-06 DOI: 10.1093/ietisy/e91-d.9.2360
J. Park, J. Yoon, H. Kim
{"title":"HMM-Based Mask Estimation for a Speech Recognition Front-End Using Computational Auditory Scene Analysis","authors":"J. Park, J. Yoon, H. Kim","doi":"10.1093/ietisy/e91-d.9.2360","DOIUrl":"https://doi.org/10.1093/ietisy/e91-d.9.2360","url":null,"abstract":"In this paper, we propose a new mask estimation method for the computational auditory scene analysis (CASA) of speech using two microphones. The proposed method is based on a hidden Markov model (HMM) in order to incorporate an observation that the mask information should be correlated over contiguous analysis frames. In other words, HMM is used to estimate the mask information represented as the interaural time difference (ITD) and the interaural level difference (ILD) of two channel signals, and the estimated mask information is finally employed in the separation of desired speech from noisy speech. To show the effectiveness of the proposed mask estimation, we then compare the performance of the proposed method with that of a Gaussian kernel-based estimation method in terms of the performance of speech recognition. As a result, the proposed HMM-based mask estimation method provided an average word error rate reduction of 69.14% when compared with the Gaussian kernel-based mask estimation method.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114663957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Beamforming with Optimized Interpolated Microphone Arrays 波束形成与优化插值麦克风阵列
2008 Hands-Free Speech Communication and Microphone Arrays Pub Date : 2008-05-06 DOI: 10.1109/HSCMA.2008.4538681
G. Doblinger
{"title":"Beamforming with Optimized Interpolated Microphone Arrays","authors":"G. Doblinger","doi":"10.1109/HSCMA.2008.4538681","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538681","url":null,"abstract":"We present an optimization procedure for wideband beam- forming with interpolated arrays. We intend to design a beam- former with a compact size. In addition, we want to reduce the number of sensors while maintaining a good beamforming performance. Our beamformers are implemented using FFT filterbanks. Performance is tested under far-field conditions and under sound propagation with simulated room impulse responses. In addition, we study the influence of sensor noise on the beamforming behavior.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124463462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Receive Side Processing for Automotive Hands-Free Systems 汽车免提系统接收侧处理
2008 Hands-Free Speech Communication and Microphone Arrays Pub Date : 2008-05-06 DOI: 10.1109/HSCMA.2008.4538730
B. Iser, G. Schmidt
{"title":"Receive Side Processing for Automotive Hands-Free Systems","authors":"B. Iser, G. Schmidt","doi":"10.1109/HSCMA.2008.4538730","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538730","url":null,"abstract":"In the sending path of automotive hands-free systems several subunits - such as acoustic echo cancellation (AEC) and noise reduction (NR) - improve the quality of the outgoing signal. These units are usually realized in the frequency or subband domain in order to reduce the computational complexity. In the receiving path, however, only a few signal processing stages - such as bandwidth extension (BWE) [1] or gain adjustment - are realized in recent systems [2, 3]. These units are implemented in most cases in the time domain, since two analysis-synthesis schemes (one in the sending and one in the receiving path) would introduce more delay than allowed by ITU- or VDA-recommendations [4]. According to the best knowledge of the authors linking of conventional processing schemes in the sending path (AEC and NR) with those of the receiving path has not yet been addressed in research on hands-free systems. For the car environment some amplifier manufacturers perform a volume control in dependence of the driving speed of the car. Some have even the possibility of placing a microphone in the cabin for measuring the noise level within the car [2, 5]. But this does not apply to hands-free telephony. The estimated power spectral density (PSD) of the background noise (already estimated within the NR unit) can be used to adjust the BWE unit. Since in high noise conditions, artifacts introduced by a BWE scheme are less audible a stronger extension can be used compared to stand-still operation. Taking also the estimated echo spectrum into account (beside the noise PSD) an estimate for the SNR within the car cabin can be obtained. Using this estimate one could perform an automatic gain control of the receive signal for retaining a particular SNR within the car while the noise or the speaking level of the remote partner is changing. This can also be done in a frequency specific manner, resulting in a frequency selective adaptive equalization. No further microphone has to be placed in the cabin and the volume can be controlled independent of the amplifier using the resources (AEC, NR) already available.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130043092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Microphone Array Front-End Interface for Home Automation 用于家庭自动化的麦克风阵列前端接口
2008 Hands-Free Speech Communication and Microphone Arrays Pub Date : 2008-05-06 DOI: 10.1109/HSCMA.2008.4538717
G.E. Coelho, A. Serralheiro, J.P. Netti
{"title":"Microphone Array Front-End Interface for Home Automation","authors":"G.E. Coelho, A. Serralheiro, J.P. Netti","doi":"10.1109/HSCMA.2008.4538717","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538717","url":null,"abstract":"In this paper we present a microphone array (MA) interface to a Spoken Dialog System. Our goal is to create a hands- free home automation system with a vocal interface to control home devices. The user establishes a dialog with a virtual butler that is able to control a plethora of home devices, such as ceiling lights, air-conditioner, windows shades, hi-fi and TV features. A MA is used for the speech acquisition front-end. The multi-channel audio acquisition is pre-processed in real-time, performing speech enhancement with Delay-and-Sum Beamforming algorithm. The Direction of Arrival is estimated with the Generalized Cross Correlation with Phase Transform algorithm, enabling us to track the user. The enhanced speech signal is then processed in order to recognize orally issued commands that will control the house appliances. This paper describes the complete system emphasizing the MA and its implications on command recognition performance.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126227772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Speech Dereverberation in Short Time Fourier Transform Domain with Crossband Effect Compensation 基于交叉带效应补偿的短时傅里叶变换域语音去噪
2008 Hands-Free Speech Communication and Microphone Arrays Pub Date : 2008-05-06 DOI: 10.1109/HSCMA.2008.4538726
T. Nakatania, T. Yoshiokaa, K. Kinoshita, M. Miyoshi, B. Juang
{"title":"Speech Dereverberation in Short Time Fourier Transform Domain with Crossband Effect Compensation","authors":"T. Nakatania, T. Yoshiokaa, K. Kinoshita, M. Miyoshi, B. Juang","doi":"10.1109/HSCMA.2008.4538726","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538726","url":null,"abstract":"It has recently been shown that the maximum likelihood estimation approach with a time-varying source model is very effective in achieving speech dereverberation based only on a short observation. In addition, STFT domain processing has been shown to be promising for implementing this dereverberation approach in a computationally efficient way. This paper presents a way of further improving the STFT domain speech dereverberation in terms of both computational cost and accuracy. One important issue here is how to calculate time-domain convolution with a long filter precisely using STFT. We introduce an STFT domain filtering method with crossband effect compensation for this purpose. Experimental results show that the proposed method allows us to implement the dereverberation algorithm in the STFT domain more precisely with less computational cost than the existing method.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130499734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
On the Use of Empirically Determined Impulse Responses for Improving Distant Talking Speech Recognition 利用经验确定的脉冲响应改进远距离通话语音识别
2008 Hands-Free Speech Communication and Microphone Arrays Pub Date : 2008-05-06 DOI: 10.1109/HSCMA.2008.4538710
T. Plotz, G. Fink
{"title":"On the Use of Empirically Determined Impulse Responses for Improving Distant Talking Speech Recognition","authors":"T. Plotz, G. Fink","doi":"10.1109/HSCMA.2008.4538710","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538710","url":null,"abstract":"Recognition rates of distant talking speech recognition applications substantially decrease if the acoustic environment contains reverberation. Although standard approaches for compensating such distortions, e.g. cepstral mean subtraction (CMS), are quite effective, they are not appropriate for dynamic human machine interaction. When only short portions of speech are uttered by speakers at different positions, compensation methods fail that require several seconds of speech. For this kind of applications we present a dereverberation approach utilizing empirically determined impulse responses. Prior to speaking users are asked to produce some impulse-like signal (clapping their hands, or snipping the fingers) which is used for compensation. By means of an experimental evaluation on the German Verbmobil corpus we demonstrate the promising potential of the approach.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132515129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Joint Particle Filter and Multi-Step Linear Prediction Framework to Provide Enhanced Speech Features Prior to Automatic Recognition 联合粒子滤波和多步线性预测框架在自动识别前提供增强的语音特征
2008 Hands-Free Speech Communication and Microphone Arrays Pub Date : 2008-05-06 DOI: 10.1109/HSCMA.2008.4538704
M. Wolfel
{"title":"A Joint Particle Filter and Multi-Step Linear Prediction Framework to Provide Enhanced Speech Features Prior to Automatic Recognition","authors":"M. Wolfel","doi":"10.1109/HSCMA.2008.4538704","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538704","url":null,"abstract":"Automatic speech recognition, which works well on recordings captured with mid- or far-field microphones, is essential for a natural verbal communication between humans and machines. While a great deal of research effort has addressed one of the two distortions frequently encountered in mid- and far-field sound capture, namely non-stationary noise and reverberation, much less work has undertaken to jointly combat both kinds of distortions. In our view, however, this joint approach is essential in order to further reduce catastrophic effects of noise and reverberation that are encountered as soon as the microphone is more than a few centimeters from the speaker's mouth. We propose here to integrate an estimate of the reverberation obtained by multi-step linear prediction into a particle filter framework that tracks and removes non-stationary additive distortions. Evaluations on actual recordings with different speaker to microphone distances demonstrate that techniques combating either non-stationary noise or reverberation can be combined for good effect.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131983026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Maximum Likelihood Time Delay Estimation with Phase Domain Analysis in the Generalized Cross Correlation Framework 广义互相关框架下相域分析的最大似然时延估计
2008 Hands-Free Speech Communication and Microphone Arrays Pub Date : 2008-05-06 DOI: 10.1109/HSCMA.2008.4538695
Bowon Lee, A. Said, T. Kalker, R. Schafer
{"title":"Maximum Likelihood Time Delay Estimation with Phase Domain Analysis in the Generalized Cross Correlation Framework","authors":"Bowon Lee, A. Said, T. Kalker, R. Schafer","doi":"10.1109/HSCMA.2008.4538695","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538695","url":null,"abstract":"We propose a new method for efficiently estimating the maximum likelihood frequency weighting in the generalized cross correlation framework for time delay estimation. The estimation is based on the analysis of the cross spectrum between a pair of microphones. We model how phase distribution is affected by both noise and reverberation, and relax the common assumption that noise and reverberation are uncorrelated with the source. Thus, our method does not require knowledge of the noise spectrum or a detailed model of the reverberation. Experimental results show that the proposed method is superior to the PHAT method.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132084544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Bridging the Gap: Towards a Unified Framework for Hands-Free Speech Recognition Using Microphone Arrays 弥合差距:使用麦克风阵列实现免提语音识别的统一框架
2008 Hands-Free Speech Communication and Microphone Arrays Pub Date : 2008-05-06 DOI: 10.1109/HSCMA.2008.4538698
Michael L. Seltzer
{"title":"Bridging the Gap: Towards a Unified Framework for Hands-Free Speech Recognition Using Microphone Arrays","authors":"Michael L. Seltzer","doi":"10.1109/HSCMA.2008.4538698","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538698","url":null,"abstract":"In this paper we describe two families of algorithms for hands-free speech recognition using microphone arrays. Enhancement-based approaches use a cascade of independent processing blocks to perform speech enhancement followed by speech recognition. We discuss the reasons why this approach may be sub-optimal and motivate the need for a solution that tightly integrates all processing blocks into a common unified framework. This leads to a second family of algorithms called unified approaches which considers all processing stages to be components of a single system that operates with the common goal of improved recognition accuracy. We describe several examples of such algorithms that have been shown to outperform more traditional signal-processing-based approaches. In doing so, we hope to convey the benefits of performing hands-free speech recognition in this manner and motivate further research in this area.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127855043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信