IEEE Transactions on Audio Speech and Language Processing最新文献_第8页

Automatic Accent Assessment Using Phonetic Mismatch and Human Perception 基于语音不匹配和人类感知的自动口音评估

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-09-01 DOI: 10.1109/TASL.2013.2258011

F. William, A. Sangwan, J. Hansen

{"title":"Automatic Accent Assessment Using Phonetic Mismatch and Human Perception","authors":"F. William, A. Sangwan, J. Hansen","doi":"10.1109/TASL.2013.2258011","DOIUrl":"https://doi.org/10.1109/TASL.2013.2258011","url":null,"abstract":"In this study, a new algorithm for automatic accent evaluation of native and non-native speakers is presented. The proposed system consists of two main steps: alignment and scoring. In the alignment step, the speech utterance is processed using a Weighted Finite State Transducer (WFST) based technique to automatically estimate the pronunciation mismatches (substitutions, deletions, and insertions). Subsequently, in the scoring step, two scoring systems which utilize the pronunciation mismatches from the alignment phase are proposed: (i) a WFST-scoring system to measure the degree of accentedness on a scale from -1 (non-native like) to +1 (native like), and a (ii) Maximum Entropy (ME) based technique to assign perceptually motivated scores to pronunciation mismatches. The accent scores provided from the WFST-scoring system as well as the ME scoring system are termed as the WFST and P-WFST (perceptual WFST) accent scores, respectively. The proposed systems are evaluated on American English (AE) spoken by native and non-native (native speakers of Mandarin-Chinese) speakers from the CU-Accent corpus. A listener evaluation of 50 Native American English (N-AE) was employed to assist in validating the performance of the proposed accent assessment systems. The proposed P-WFST algorithm shows higher and more consistent correlation with human evaluated accent scores, when compared to the Goodness Of Pronunciation (GOP) measure. The proposed solution for accent classification and assessment based on WFST and P-WFST scores show that an effective advancement is possible which correlates well with human perception.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"1818-1829"},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2258011","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62889088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Nonlinear Acoustic Echo Cancellation Based on a Sliding-Window Leaky Kernel Affine Projection Algorithm 基于滑动窗漏核仿射投影算法的非线性声回波消除

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-09-01 DOI: 10.1109/TASL.2013.2260742

Jose Manuel Gil-Cacho, M. Signoretto, T. Waterschoot, M. Moonen, S. H. Jensen

{"title":"Nonlinear Acoustic Echo Cancellation Based on a Sliding-Window Leaky Kernel Affine Projection Algorithm","authors":"Jose Manuel Gil-Cacho, M. Signoretto, T. Waterschoot, M. Moonen, S. H. Jensen","doi":"10.1109/TASL.2013.2260742","DOIUrl":"https://doi.org/10.1109/TASL.2013.2260742","url":null,"abstract":"Acoustic echo cancellation (AEC) is used in speech communication systems where the existence of echoes degrades the speech intelligibility. Standard approaches to AEC rely on the assumption that the echo path to be identified can be modeled by a linear filter. However, some elements introduce nonlinear distortion and must be modeled as nonlinear systems. Several nonlinear models have been used with more or less success. The kernel affine projection algorithm (KAPA) has been successfully applied to many areas in signal processing but not yet to nonlinear AEC (NLAEC). The contribution of this paper is three-fold: (1) to apply KAPA to the NLAEC problem, (2) to develop a sliding-window leaky KAPA (SWL-KAPA) that is well suited for NLAEC applications, and (3) to propose a kernel function, consisting of a weighted sum of a linear and a Gaussian kernel. In our experiment set-up, the proposed SWL-KAPA for NLAEC consistently outperforms the linear APA, resulting in up to 12 dB of improvement in ERLE at a computational cost that is only 4.6 times higher. Moreover, it is shown that the SWL-KAPA outperforms, by 4-6 dB, a Volterra-based NLAEC, which itself has a much higher 413 times computational cost than the linear APA.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"1867-1878"},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2260742","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62889541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 58

Sound Source Distance Estimation in Rooms based on Statistical Properties of Binaural Signals 基于双耳信号统计特性的室内声源距离估计

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2260155

Eleftheria Georganti, T. May, S. Par, J. Mourjopoulos

引用次数: 36

Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition 基于贝叶斯特征增强的混响和噪声鲁棒语音识别

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2258013

Volker Leutnant, A. Krueger, Reinhold Häb-Umbach

{"title":"Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition","authors":"Volker Leutnant, A. Krueger, Reinhold Häb-Umbach","doi":"10.1109/TASL.2013.2258013","DOIUrl":"https://doi.org/10.1109/TASL.2013.2258013","url":null,"abstract":"In this contribution we extend a previously proposed Bayesian approach for the enhancement of reverberant logarithmic mel power spectral coefficients for robust automatic speech recognition to the additional compensation of background noise. A recently proposed observation model is employed whose time-variant observation error statistics are obtained as a side product of the inference of the a posteriori probability density function of the clean speech feature vectors. Further a reduction of the computational effort and the memory requirements are achieved by using a recursive formulation of the observation model. The performance of the proposed algorithms is first experimentally studied on a connected digits recognition task with artificially created noisy reverberant data. It is shown that the use of the time-variant observation error model leads to a significant error rate reduction at low signal-to-noise ratios compared to a time-invariant model. Further experiments were conducted on a 5000 word task recorded in a reverberant and noisy environment. A significant word error rate reduction was obtained demonstrating the effectiveness of the approach on real-world data.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"1640-1652"},"PeriodicalIF":0.0,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2258013","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

A General Compression Approach to Multi-Channel Three-Dimensional Audio 多声道三维音频的通用压缩方法

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2260156

B. Cheng, C. Ritz, I. Burnett, Xiguang Zheng

引用次数: 16

Sparse Classifier Fusion for Speaker Verification 基于稀疏分类器融合的说话人验证

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2256895

Ville Hautamäki, T. Kinnunen, Filip Sedlak, Kong-Aik Lee, B. Ma, Haizhou Li

引用次数: 48

Syntax-Based Translation With Bilingually Lexicalized Synchronous Tree Substitution Grammars 基于句法的双语词汇化同步树替换语法翻译

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2255283

Jiajun Zhang, Feifei Zhai, Chengqing Zong

{"title":"Syntax-Based Translation With Bilingually Lexicalized Synchronous Tree Substitution Grammars","authors":"Jiajun Zhang, Feifei Zhai, Chengqing Zong","doi":"10.1109/TASL.2013.2255283","DOIUrl":"https://doi.org/10.1109/TASL.2013.2255283","url":null,"abstract":"Syntax-based models can significantly improve the translation performance due to their grammatical modeling on one or both language side(s). However, the translation rules such as the non-lexical rule “ VP→(x0x1,VP:x1PP:x0)” in string-to-tree models do not consider any lexicalized information on the source or target side. The rule is so generalized that any subtree rooted at VP can substitute for the nonterminal VP:x1. Because rules containing nonterminals are frequently used when generating the target-side tree structures, there is a risk that rules of this type will potentially be severely misused in decoding due to a lack of lexicalization guidance. In this article, inspired by lexicalized PCFG, which is widely used in monolingual parsing, we propose to upgrade the STSG (synchronous tree substitution grammars)-based syntax translation model with bilingually lexicalized STSG. Using the string-to-tree translation model as a case study, we present generative and discriminative models to integrate lexicalized STSG into the translation model. Both small- and large-scale experiments on Chinese-to-English translation demonstrate that the proposed lexicalized STSG can provide superior rule selection in decoding and substantially improve the translation quality.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"1586-1597"},"PeriodicalIF":0.0,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2255283","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

A Class of Algorithms for Time-Frequency Multiplier Estimation 一类时频乘子估计算法

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2255274

Anaïk Olivero, B. Torrésani, R. Kronland-Martinet

引用次数: 25

The Spectral Nature of Maximum Likelihood Noise Compensated Linear Prediction 最大似然噪声补偿线性预测的频谱性质

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2255277

L. Weruaga, L. Dimitrov

引用次数: 3

Broadband DOA Estimation Using Sensor Arrays on Complex-Shaped Rigid Bodies 基于复杂形状刚体传感器阵列的宽带DOA估计

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2255282

Dumidu S. Talagala, Wen Zhang, T. Abhayapala

{"title":"Broadband DOA Estimation Using Sensor Arrays on Complex-Shaped Rigid Bodies","authors":"Dumidu S. Talagala, Wen Zhang, T. Abhayapala","doi":"10.1109/TASL.2013.2255282","DOIUrl":"https://doi.org/10.1109/TASL.2013.2255282","url":null,"abstract":"Sensor arrays mounted on complex-shaped rigid bodies are a common feature in many practical broadband direction of arrival (DOA) estimation applications. The scattering and reflections caused by these rigid bodies introduce complexity and diversity in the frequency domain of the channel transfer function, which presents several challenges to existing broadband DOA estimators. This paper presents a novel high resolution broadband DOA estimation technique based on signal subspace decomposition. We describe how broadband signals can be decomposed into narrow subband components, and combined such that the frequency domain diversity is retained. The DOA estimation performance is compared with existing techniques using a uniform circular array and a sensor array on a hypothetical rigid body. An improvement in closely spaced source resolution of up to 6 dB is observed for the sensor array on the hypothetical rigid body, in comparison to the uniform circular array. The results suggest that frequency domain diversity, introduced by complex-shaped rigid bodies, can provide higher resolution and clearer separation of closely spaced broadband sound sources.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"1573-1585"},"PeriodicalIF":0.0,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2255282","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10