IEEE Transactions on Audio Speech and Language Processing最新文献

筛选
英文 中文
Nonlinear Acoustic Echo Cancellation Based on a Sliding-Window Leaky Kernel Affine Projection Algorithm 基于滑动窗漏核仿射投影算法的非线性声回波消除
IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-09-01 DOI: 10.1109/TASL.2013.2260742
Jose Manuel Gil-Cacho, M. Signoretto, T. Waterschoot, M. Moonen, S. H. Jensen
{"title":"Nonlinear Acoustic Echo Cancellation Based on a Sliding-Window Leaky Kernel Affine Projection Algorithm","authors":"Jose Manuel Gil-Cacho, M. Signoretto, T. Waterschoot, M. Moonen, S. H. Jensen","doi":"10.1109/TASL.2013.2260742","DOIUrl":"https://doi.org/10.1109/TASL.2013.2260742","url":null,"abstract":"Acoustic echo cancellation (AEC) is used in speech communication systems where the existence of echoes degrades the speech intelligibility. Standard approaches to AEC rely on the assumption that the echo path to be identified can be modeled by a linear filter. However, some elements introduce nonlinear distortion and must be modeled as nonlinear systems. Several nonlinear models have been used with more or less success. The kernel affine projection algorithm (KAPA) has been successfully applied to many areas in signal processing but not yet to nonlinear AEC (NLAEC). The contribution of this paper is three-fold: (1) to apply KAPA to the NLAEC problem, (2) to develop a sliding-window leaky KAPA (SWL-KAPA) that is well suited for NLAEC applications, and (3) to propose a kernel function, consisting of a weighted sum of a linear and a Gaussian kernel. In our experiment set-up, the proposed SWL-KAPA for NLAEC consistently outperforms the linear APA, resulting in up to 12 dB of improvement in ERLE at a computational cost that is only 4.6 times higher. Moreover, it is shown that the SWL-KAPA outperforms, by 4-6 dB, a Volterra-based NLAEC, which itself has a much higher 413 times computational cost than the linear APA.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2260742","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62889541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
Sound Source Distance Estimation in Rooms based on Statistical Properties of Binaural Signals 基于双耳信号统计特性的室内声源距离估计
IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2260155
Eleftheria Georganti, T. May, S. Par, J. Mourjopoulos
{"title":"Sound Source Distance Estimation in Rooms based on Statistical Properties of Binaural Signals","authors":"Eleftheria Georganti, T. May, S. Par, J. Mourjopoulos","doi":"10.1109/TASL.2013.2260155","DOIUrl":"https://doi.org/10.1109/TASL.2013.2260155","url":null,"abstract":"A novel method for the estimation of the distance of a sound source from binaural speech signals is proposed. The method relies on several statistical features extracted from such signals and their binaural cues. Firstly, the standard deviation of the difference of the magnitude spectra of the left and right binaural signals is used as a feature for this method. In addition, an extended set of additional statistical features that can improve distance detection is extracted from an auditory front-end which models the peripheral processing of the human auditory system. The method incorporates the above features into two classification frameworks based on Gaussian mixture models and Support Vector Machines and the relative merits of those frameworks are evaluated. The proposed method achieves distance detection when tested in various acoustical environments and performs well in unknown environments. Its performance is also compared to an existing binaural distance detection method.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2260155","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62889357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition 基于贝叶斯特征增强的混响和噪声鲁棒语音识别
IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2258013
Volker Leutnant, A. Krueger, Reinhold Häb-Umbach
{"title":"Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition","authors":"Volker Leutnant, A. Krueger, Reinhold Häb-Umbach","doi":"10.1109/TASL.2013.2258013","DOIUrl":"https://doi.org/10.1109/TASL.2013.2258013","url":null,"abstract":"In this contribution we extend a previously proposed Bayesian approach for the enhancement of reverberant logarithmic mel power spectral coefficients for robust automatic speech recognition to the additional compensation of background noise. A recently proposed observation model is employed whose time-variant observation error statistics are obtained as a side product of the inference of the a posteriori probability density function of the clean speech feature vectors. Further a reduction of the computational effort and the memory requirements are achieved by using a recursive formulation of the observation model. The performance of the proposed algorithms is first experimentally studied on a connected digits recognition task with artificially created noisy reverberant data. It is shown that the use of the time-variant observation error model leads to a significant error rate reduction at low signal-to-noise ratios compared to a time-invariant model. Further experiments were conducted on a 5000 word task recorded in a reverberant and noisy environment. A significant word error rate reduction was obtained demonstrating the effectiveness of the approach on real-world data.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2258013","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A General Compression Approach to Multi-Channel Three-Dimensional Audio 多声道三维音频的通用压缩方法
IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2260156
B. Cheng, C. Ritz, I. Burnett, Xiguang Zheng
{"title":"A General Compression Approach to Multi-Channel Three-Dimensional Audio","authors":"B. Cheng, C. Ritz, I. Burnett, Xiguang Zheng","doi":"10.1109/TASL.2013.2260156","DOIUrl":"https://doi.org/10.1109/TASL.2013.2260156","url":null,"abstract":"This paper presents a technique for low bit rate compression of three-dimensional (3D) audio produced by multiple loudspeaker channels. The approach is based on the time-frequency analysis of the localization of spatial sound sources within the 3D space as rendered by a multi-channel audio signal (in this case 16 channels). This analysis results in the derivation of a stereo downmix signal representing the original 16 channels. Alternatively, a mono-downmix signal with side information representing the location of sound sources within the 3D spatial scene can also be derived. The resulting downmix signals are then compressed with a traditional audio coder, resulting in a representation of the 3D soundfield at bit rates comparable with existing stereo audio coders while maintaining the perceptual quality produced from separate encoding of each channel.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2260156","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62889407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Sparse Classifier Fusion for Speaker Verification 基于稀疏分类器融合的说话人验证
IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2256895
Ville Hautamäki, T. Kinnunen, Filip Sedlak, Kong-Aik Lee, B. Ma, Haizhou Li
{"title":"Sparse Classifier Fusion for Speaker Verification","authors":"Ville Hautamäki, T. Kinnunen, Filip Sedlak, Kong-Aik Lee, B. Ma, Haizhou Li","doi":"10.1109/TASL.2013.2256895","DOIUrl":"https://doi.org/10.1109/TASL.2013.2256895","url":null,"abstract":"State-of-the-art speaker verification systems take advantage of a number of complementary base classifiers by fusing them to arrive at reliable verification decisions. In speaker verification, fusion is typically implemented as a weighted linear combination of the base classifier scores, where the combination weights are estimated using a logistic regression model. An alternative way for fusion is to use classifier ensemble selection, which can be seen as sparse regularization applied to logistic regression. Even though score fusion has been extensively studied in speaker verification, classifier ensemble selection is much less studied. In this study, we extensively study a sparse classifier fusion on a collection of twelve I4U spectral subsystems on the NIST 2008 and 2010 speaker recognition evaluation (SRE) corpora.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2256895","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Syntax-Based Translation With Bilingually Lexicalized Synchronous Tree Substitution Grammars 基于句法的双语词汇化同步树替换语法翻译
IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2255283
Jiajun Zhang, Feifei Zhai, Chengqing Zong
{"title":"Syntax-Based Translation With Bilingually Lexicalized Synchronous Tree Substitution Grammars","authors":"Jiajun Zhang, Feifei Zhai, Chengqing Zong","doi":"10.1109/TASL.2013.2255283","DOIUrl":"https://doi.org/10.1109/TASL.2013.2255283","url":null,"abstract":"Syntax-based models can significantly improve the translation performance due to their grammatical modeling on one or both language side(s). However, the translation rules such as the non-lexical rule “ VP→(x0x1,VP:x1PP:x0)” in string-to-tree models do not consider any lexicalized information on the source or target side. The rule is so generalized that any subtree rooted at VP can substitute for the nonterminal VP:x1. Because rules containing nonterminals are frequently used when generating the target-side tree structures, there is a risk that rules of this type will potentially be severely misused in decoding due to a lack of lexicalization guidance. In this article, inspired by lexicalized PCFG, which is widely used in monolingual parsing, we propose to upgrade the STSG (synchronous tree substitution grammars)-based syntax translation model with bilingually lexicalized STSG. Using the string-to-tree translation model as a case study, we present generative and discriminative models to integrate lexicalized STSG into the translation model. Both small- and large-scale experiments on Chinese-to-English translation demonstrate that the proposed lexicalized STSG can provide superior rule selection in decoding and substantially improve the translation quality.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2255283","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Class of Algorithms for Time-Frequency Multiplier Estimation 一类时频乘子估计算法
IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2255274
Anaïk Olivero, B. Torrésani, R. Kronland-Martinet
{"title":"A Class of Algorithms for Time-Frequency Multiplier Estimation","authors":"Anaïk Olivero, B. Torrésani, R. Kronland-Martinet","doi":"10.1109/TASL.2013.2255274","DOIUrl":"https://doi.org/10.1109/TASL.2013.2255274","url":null,"abstract":"We propose here a new approach together with a corresponding class of algorithms for offline estimation of linear operators mapping input to output signals. The operators are modeled as multipliers, i.e., linear and diagonal operator in a frame or Bessel representation of signals (like Gabor, wavelets ...) and characterized by a transfer function. The estimation problem is formulated as a regularized inverse problem, and solved using iterative algorithms, based on gradient descent schemes. Various estimation problems, which differ by a choice for the regularization function, are studied in the case of Gabor multipliers. The transfer function actually provides a meaningful interpretation of the differences between the two signals or signal classes under consideration, and examples are discussed. Furthermore, examples of signal transformations with such Gabor transfer functions are also given.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2255274","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
The Spectral Nature of Maximum Likelihood Noise Compensated Linear Prediction 最大似然噪声补偿线性预测的频谱性质
IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2255277
L. Weruaga, L. Dimitrov
{"title":"The Spectral Nature of Maximum Likelihood Noise Compensated Linear Prediction","authors":"L. Weruaga, L. Dimitrov","doi":"10.1109/TASL.2013.2255277","DOIUrl":"https://doi.org/10.1109/TASL.2013.2255277","url":null,"abstract":"The effects of noise in autoregressive (AR) analysis (or linear prediction) and its compensation (NCAR) has been commonly carried out in the time domain under the least square (LS) criterion. This paper studies the adequacy of such an approach by means of a comparative analysis with selected frequency-based NCAR methods. In particular, the maximization of the spectral likelihood (ML) results in a proper optimization problem that is easy to solve and brings useful insights into the rationale of the NCAR problem. On the contrary, popular time-based NCAR methods are shown in the paper to be designed, in the ML context, around ill-conditioned criteria, requiring constraints to guarantee stable solutions. The statistical analysis on a realistic scenario as well as an experiment on speech enhancement complement this analysis.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2255277","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Broadband DOA Estimation Using Sensor Arrays on Complex-Shaped Rigid Bodies 基于复杂形状刚体传感器阵列的宽带DOA估计
IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2255282
Dumidu S. Talagala, Wen Zhang, T. Abhayapala
{"title":"Broadband DOA Estimation Using Sensor Arrays on Complex-Shaped Rigid Bodies","authors":"Dumidu S. Talagala, Wen Zhang, T. Abhayapala","doi":"10.1109/TASL.2013.2255282","DOIUrl":"https://doi.org/10.1109/TASL.2013.2255282","url":null,"abstract":"Sensor arrays mounted on complex-shaped rigid bodies are a common feature in many practical broadband direction of arrival (DOA) estimation applications. The scattering and reflections caused by these rigid bodies introduce complexity and diversity in the frequency domain of the channel transfer function, which presents several challenges to existing broadband DOA estimators. This paper presents a novel high resolution broadband DOA estimation technique based on signal subspace decomposition. We describe how broadband signals can be decomposed into narrow subband components, and combined such that the frequency domain diversity is retained. The DOA estimation performance is compared with existing techniques using a uniform circular array and a sensor array on a hypothetical rigid body. An improvement in closely spaced source resolution of up to 6 dB is observed for the sensor array on the hypothetical rigid body, in comparison to the uniform circular array. The results suggest that frequency domain diversity, introduced by complex-shaped rigid bodies, can provide higher resolution and clearer separation of closely spaced broadband sound sources.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2255282","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A Free-Source Method (FrSM) for Calibrating a Large-Aperture Microphone Array 一种自由源法校准大口径传声器阵列
IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2256896
Sarthak Khanal, H. Silverman, Rahul R. Shakya
{"title":"A Free-Source Method (FrSM) for Calibrating a Large-Aperture Microphone Array","authors":"Sarthak Khanal, H. Silverman, Rahul R. Shakya","doi":"10.1109/TASL.2013.2256896","DOIUrl":"https://doi.org/10.1109/TASL.2013.2256896","url":null,"abstract":"Large-aperture microphone arrays can be used to capture and enhance speech from individual talkers in noisy, multi-talker, and reverberant environments. However, they must be calibrated, often more than once, to obtain accurate 3-dimensional coordinates for all microphones. Direct-measurement techniques, such as using a measuring tape or a laser-based tool are cumbersome and time-consuming. Some previous methods that used acoustic signals for array calibration required bulky hardware and/or fixed, known source locations. Others, which allowed more flexible source placement, often have issues with real data, have reported results for 2D only, or work only for small arrays. This paper describes a complete and robust method for automatic calibration using acoustic signals which is simple, repeatable, accurate, and has been shown to work for a real system. The method requires only a single transducer (speaker) with a microphone attached above its center. The unit is freely moved around the focal volume of the microphone array generating a single long recording from all the microphones. After that, the system is completely automatic. We describe the free source method (FrSM), validate its effectiveness and present accuracy results against measured ground truth. The performance of FrSM is compared to that from several other methods for a real 128-microphone array.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2256896","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信