A. Sugiyama, Thanh Phong Hua, R. Le Bouquin Jeanne, G. Faucon
{"title":"A Comparative Study of Adaptation-Mode Control for Generalized Sidelobe Cancellers in Human-Robot Communication","authors":"A. Sugiyama, Thanh Phong Hua, R. Le Bouquin Jeanne, G. Faucon","doi":"10.1109/HSCMA.2008.4538676","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538676","url":null,"abstract":"This paper presents a comparative study of adaptation-mode control (AMC) for generalized sidelobe cancellers in human-robot communication. Performance of recently proposed two AMC structures, namely, NBM-SLBM (nested blocking matrix-symmetric leaky blocking matrix) and M-SLBM (multiple symmetric leaky blocking matrix), are evaluated by computer simulations and in a real environment. In the computer simulations, it is shown that M-SLBM exhibits superior performance to NBM-SLBM. However, in the real environment, the performance of M-SLBM is degraded. This degradation comes from unexpected tonal interference in a frequency range covered by an SLBM, leading to errors. An appropriate selection between NBM-SLBM and M-SLBM is necessary based on the environment.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115400713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Araki, M. Fujimoto, K. Ishizuka, H. Sawada, S. Makino
{"title":"A DOA Based Speaker Diarization System for Real Meetings","authors":"S. Araki, M. Fujimoto, K. Ishizuka, H. Sawada, S. Makino","doi":"10.1109/HSCMA.2008.4538680","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538680","url":null,"abstract":"This paper presents a speaker diarization system that estimates who spoke when in a meeting. Our proposed system is realized by using a noise robust voice activity detector (VAD), a direction of arrival (DOA) estimator, and a DOA classifier. Our previous system utilized the generalized cross correlation method with the phase transform (GCC-PHAT) approach for the DOA estimation. Because the GCC-PHAT can estimate just one DOA per frame, it was difficult to handle speaker overlaps. This paper tries to deal with this issue by employing a DOA at each time-frequency slot (TFDOA), and reports how it improves diarization performance for real meetings / conversations recorded in a room with a reverberation time of 350 ms.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121117131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Kuech, M. Kallinger, R. Schultz-Amling, G. del Galdo, J. Ahonen, V. Pulkki
{"title":"Directional Audio Coding Using Planar Microphone Arrays","authors":"F. Kuech, M. Kallinger, R. Schultz-Amling, G. del Galdo, J. Ahonen, V. Pulkki","doi":"10.1109/HSCMA.2008.4538682","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538682","url":null,"abstract":"Multichannel sound systems become more and more established in modern audio applications. Consequently, the recording and the reproduction of spatial audio gains increasing attention. Directional Audio Coding (DirAC) represents an efficient approach to analyze spatial sound and to reproduce it using arbitrary loudspeaker configurations. In DirAC, the direction-of-arrival and the diffuseness of sound within frequency subbands is used to encode the spatial properties of the observed sound field. The estimation of these parameters is based on an energetic sound field analysis using three- dimensional microphone arrays. In practice, however, physical design constraints make three-dimensional microphone configurations often not acceptable. In this paper, we consider a new approach to microphone array processing that allows for an estimation of both direction-of-arrival of sound and diffuseness based on planar microphone configurations. The performance of the proposed method is evaluated via simulations and real measured data.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124887876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast Dereverberation for Hands-Free Speech Recognition","authors":"R. Gomez, J. Even, H. Saruwatari, K. Shikano","doi":"10.1109/HSCMA.2008.4538706","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538706","url":null,"abstract":"A robust dereverberation technique for real-time hands-free speech recognition application is proposed. Real-time implementation is made possible by avoiding time-consuming blind estimation. Instead, we use the impulse response by effectively identifying the late reflection components of it. Using this information, together with the concept of Spectral Subtraction (SS), we were able to remove the effects of the late reflection of the reverberant signal. After dereverberation, only the effects of the early component is left and used as input to the recognizer. In this method, multi-band SS is used in order to compensate for the error arising from approximation. We also introduced a training strategy to optimize the values of the multi-band coefficients to minimize the error.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125903927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Maximum Likelihood Detector of Reliable Direction-of-Arrival Estimate","authors":"Seungil Kim, G. Song, Hyejeong Jeon, Lag-Yong Kim","doi":"10.1109/HSCMA.2008.4538691","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538691","url":null,"abstract":"In this paper, we propose a maximum likelihood detector for reliable sound source localization system. It is based on making a measure of reliability of estimation results. The reliability can be reduced from waterbed effect of source localization algorithm. If the calculated reliability measure has a lower value than a predefined threshold, the estimated direction-of-arrival (DOA) is regarded as a wrong result and subsequently discarded. We determine the threshold for reliable estimate selection using maximum likelihood rule. Some experiments show that the proposed method can reject perturbed results of the estimated DOA.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125285452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Kallinger, F. Kuech, R. Schultz-Amling, G. del Galdo, J. Ahonen, V. Pulkki
{"title":"Enhanced Direction Estimation Using Microphone Arrays for Directional Audio Coding","authors":"M. Kallinger, F. Kuech, R. Schultz-Amling, G. del Galdo, J. Ahonen, V. Pulkki","doi":"10.1109/HSCMA.2008.4538684","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538684","url":null,"abstract":"Modern home entertainment systems offer surround sound audio playback. This progress over known mono and stereo devices is also intended for high quality hands-free telephony to enhance intelligibility of speech in group conversation. Directional Audio Coding (DirAC) provides an efficient and well-established way to record and encode spatial sound and to render it at an arbitrary loudspeaker setup. On the recording site, DirAC is based on B-format microphone signals. These signals can be obtained by one omnidirectional and three figure-of-eight microphones pointing along the axes of a three-dimensional Cartesian coordinate system. However, a grid of omnidirectional microphones is more appropriate for consumer applications due to economic reasons. Arrays can provide the required figure-of-eight directionality only for a certain frequency range. However, in this contribution we show that a straightforward direction estimator is biased. After formulating the bias analytically we propose an unbiased estimator and derive the theoretical limits for unique direction estimation. The results are illustrated by means of simulations and measurements.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125416363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Speech Separation Using an Adaptive Sparse Dictionary Algorithm","authors":"M. Jafari, Mark D. Plumbley, M. Davies","doi":"10.1109/HSCMA.2008.4538679","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538679","url":null,"abstract":"We present a greedy adaptive algorithm that builds a sparse orthogonal dictionary from the observed data. In this paper, the algorithm is used to separate stereo speech signals, and the phase information that is inherent to the extracted atom pairs is used for clustering and identification of the original sources. The performance of the algorithm is compared to that of the adaptive stereo basis algorithm, when the sources are mixed in echoic and anechoic environments. We find that the algorithm correctly separates the sources, and can do this even with a relatively small number of atoms.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126626749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stefan Goetze, M. Kallinger, A. Mertins, K. Kammeyer
{"title":"System Identification for Multi-Channel Listening-Room Compensation Using an Acoustic Echo Canceller","authors":"Stefan Goetze, M. Kallinger, A. Mertins, K. Kammeyer","doi":"10.1109/HSCMA.2008.4538727","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538727","url":null,"abstract":"Modern hands-free telecommunication devices jointly apply several subsystems, e.g. for noise reduction (NR), acoustic echo cancellation (AEC) and listening-room compensation (LRC). In this contribution the combination of an equalizer for listening room compensation and an acoustic echo canceller is analyzed. Inverse filtering of room impulse responses (RIRs) is a challenging task since they are, in general, mixed phase systems having hundreds of zeros inside and outside near the unit circle in the z-domain. Furthermore, a reliable estimate of the RIR which shall be inverted is important. Since RIRs are time-variant due to possible changes of the acoustic environment, they have to be identified adaptively. If an AEC (or any other adaptive method) is used to identify the time variant room impulse responses the estimate's distance to the real RIRs may be too high for a satisfying equalization, especially in periods of initial convergence of the AEC or after RIR changes. Therefore, we propose to estimate the convergence state of the AEC and to incorporate this knowledge into the equalizer design.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130436895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Blind Estimation and Suppression of Late Reverberation Utilising Auditory Masking","authors":"A. Tsilfidis, J. Mourjopoulos, D. Tsoukalas","doi":"10.1109/HSCMA.2008.4538723","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538723","url":null,"abstract":"A new method for blind estimation and suppression of late reverberation of speech signals is presented. The proposed algorithm consists of two steps. In a first step, the reverberation time is blindly determined from the reverberant signal. Then, an approximation of the power spectrum of late reverberation is subtracted from the power spectrum of the reverberant signal. Hence, a preliminary estimation of the anechoic speech spectrum is derived. In a second step, the auditory masking threshold of the clean spectrum estimation is calculated and used to define the coefficients for a nonlinear filter for the reverberant signal, which produces the final enhanced speech signal. The performance of the algorithm is demonstrated on artificially generated signals. Subjective tests are conducted and their results indicate that the quality of the speech signals obtained by the proposed method is superior when compared to previous methods.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134443258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integration of Phoneme-Subspaces Using ICA for Speech Feature Extraction and Recognition","authors":"Hyunsin Park, T. Takiguchi, Y. Ariki","doi":"10.1109/HSCMA.2008.4538708","DOIUrl":"https://doi.org/10.1109/HSCMA.2008.4538708","url":null,"abstract":"In our previous work, the use of PCA instead of DCT shows robustness in distorted speech recognition because the main speech element is projected onto low-order features, while the noise or distortion element is projected onto high-order features [1]. This paper introduces a new feature extraction technique that collects the correlation information among phoneme subspaces and their elements are statistically mutual independent. The proposed speech feature vector is generated by projecting observed vector onto integrated space obtained by PCA and ICA. The performance evaluation shows that the proposed method provides a higher isolated word recognition accuracy than conventional methods in some reverberant conditions.","PeriodicalId":129827,"journal":{"name":"2008 Hands-Free Speech Communication and Microphone Arrays","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131807442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}