IEEE Trans. Speech Audio Process.最新文献

筛选
英文 中文
Blind single channel deconvolution using nonstationary signal processing 采用非平稳信号处理的盲单通道反卷积
IEEE Trans. Speech Audio Process. Pub Date : 2003-08-26 DOI: 10.1109/TSA.2003.815522
J. Hopgood, P. Rayner
{"title":"Blind single channel deconvolution using nonstationary signal processing","authors":"J. Hopgood, P. Rayner","doi":"10.1109/TSA.2003.815522","DOIUrl":"https://doi.org/10.1109/TSA.2003.815522","url":null,"abstract":"Blind deconvolution is fundamental in signal processing applications and, in particular, the single channel case remains a challenging and formidable problem. This paper considers single channel blind deconvolution in the case where the degraded observed signal may be modeled as the convolution of a nonstationary source signal with a stationary distortion operator. The important feature that the source is nonstationary while the channel is stationary facilitates the unambiguous identification of either the source or channel, and deconvolution is possible, whereas if the source and channel are both stationary, identification is ambiguous. The parameters for the channel are estimated by modeling the source as a time-varyng AR process and the distortion by an all-pole filter, and using the Bayesian framework for parameter estimation. This estimate can then be used to deconvolve the observed signal. In contrast to the classical histogram approach for estimating the channel poles, where the technique merely relies on the fact that the channel is actually stationary rather than modeling it as so, the proposed Bayesian method does take account for the channel's stationarity in the model and, consequently, is more robust. The properties of this model are investigated, and the advantage of utilizing the nonstationarity of a system rather than considering it as a curse is discussed.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"469 1","pages":"476-488"},"PeriodicalIF":0.0,"publicationDate":"2003-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77508201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
A new approach to utterance verification based on neighborhood information in model space 基于模型空间邻域信息的话语验证新方法
IEEE Trans. Speech Audio Process. Pub Date : 2003-08-26 DOI: 10.1109/TSA.2003.815821
Hui Jiang, Chin-Hui Lee
{"title":"A new approach to utterance verification based on neighborhood information in model space","authors":"Hui Jiang, Chin-Hui Lee","doi":"10.1109/TSA.2003.815821","DOIUrl":"https://doi.org/10.1109/TSA.2003.815821","url":null,"abstract":"We propose to use neighborhood information in model space to perform utterance verification (UV). At first, we present a nested-neighborhood structure for each underlying model in model space and assume the underlying model's competing models sit in one of these neighborhoods, which is used to model alternative hypothesis in UV. Bayes factors (BF) is first introduced to UV and used as a major tool to calculate confidence measures based on the above idea. Experimental results in the Bell Labs communicator system show that the new method has dramatically improved verification performance when verifying correct words against mis-recognized words in the recognizer's output, relatively more than 20% reduction in equal error rate (EER) when comparing with the standard approach based on likelihood ratio testing and anti-models.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"6 1","pages":"425-434"},"PeriodicalIF":0.0,"publicationDate":"2003-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87640258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
A perceptually motivated approach for speech enhancement 基于感知动机的语音增强方法
IEEE Trans. Speech Audio Process. Pub Date : 2003-08-26 DOI: 10.1109/TSA.2003.815936
Y. Hu, P. Loizou
{"title":"A perceptually motivated approach for speech enhancement","authors":"Y. Hu, P. Loizou","doi":"10.1109/TSA.2003.815936","DOIUrl":"https://doi.org/10.1109/TSA.2003.815936","url":null,"abstract":"A new perceptually motivated approach is proposed for enhancement of speech corrupted by colored noise. The proposed approach takes into account the frequency masking properties of the human auditory system and reduces the perceptual effect of the residual noise. This new perceptual method is incorporated into a frequency-domain speech enhancement method and a subspace-based speech enhancement method. A better power spectrum/autocorrelation function estimator was also developed to improve the performance of the proposed algorithms. Objective measures and informal listening tests demonstrated significant improvements over other methods when tested with TIMIT sentences corrupted by various types of colored noise.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"171 1","pages":"457-465"},"PeriodicalIF":0.0,"publicationDate":"2003-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79442675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 99
Audio source separation of convolutive mixtures 卷积混合音频源分离
IEEE Trans. Speech Audio Process. Pub Date : 2003-08-26 DOI: 10.1109/TSA.2003.815820
N. Mitianoudis, M. Davies
{"title":"Audio source separation of convolutive mixtures","authors":"N. Mitianoudis, M. Davies","doi":"10.1109/TSA.2003.815820","DOIUrl":"https://doi.org/10.1109/TSA.2003.815820","url":null,"abstract":"The problem of separation of audio sources recorded in a real world situation is well established in modern literature. A method to solve this problem is blind source separation (BSS) using independent component analysis (ICA). The recording environment is usually modeled as convolutive. Previous research on ICA of instantaneous mixtures provided solid background for the separation of convolved mixtures. The authors revise current approaches on the subject and propose a fast frequency domain ICA framework, providing a solution for the apparent permutation problem encountered in these methods.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"59 5","pages":"489-497"},"PeriodicalIF":0.0,"publicationDate":"2003-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91432369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 153
Fast model selection based speaker adaptation for nonnative speech 基于快速模型选择的非母语语音说话人自适应
IEEE Trans. Speech Audio Process. Pub Date : 2003-07-28 DOI: 10.1109/TSA.2003.814379
Xiaodong He, Yunxin Zhao
{"title":"Fast model selection based speaker adaptation for nonnative speech","authors":"Xiaodong He, Yunxin Zhao","doi":"10.1109/TSA.2003.814379","DOIUrl":"https://doi.org/10.1109/TSA.2003.814379","url":null,"abstract":"The problem of adapting acoustic models of native English speech to nonnative speakers is addressed from a perspective of adaptive model complexity selection. The goal is to select model complexity dynamically for each nonnative talker so as to optimize the balance between model robustness to pronunciation variations and model detailedness for discrimination of speech sounds. A maximum expected likelihood (MEL) based technique is proposed to enable reliable complexity selection when adaptation data are sparse, where expectation of log-likelihood (EL) of adaptation data is computed based on distributions of mismatch biases between model and data, and model complexity is selected to maximize EL. The MEL based complexity selection is further combined with MLLR (maximum likelihood linear regression) to enable adaptation of both complexity and parameters of acoustic models. Experiments were performed on WSJ1 data of speakers with a wide range of foreign accents. Results show that the MEL based complexity selection is feasible when using as little as one adaptation utterance, and it is able to select dynamically the proper model complexity as the adaptation data increases. Compared with the standard MLLR, the MEL+MLLR method leads to consistent and significant improvement to recognition accuracy on nonnative speakers, without performance degradation on native speakers.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"17 1 1","pages":"298-307"},"PeriodicalIF":0.0,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82933228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
A new duration modeling approach for Mandarin speech 一种新的汉语语音时长的建模方法
IEEE Trans. Speech Audio Process. Pub Date : 2003-07-28 DOI: 10.1109/TSA.2003.814377
Sin-Horng Chen, Wen-Hsing Lai, Yih-Ru Wang
{"title":"A new duration modeling approach for Mandarin speech","authors":"Sin-Horng Chen, Wen-Hsing Lai, Yih-Ru Wang","doi":"10.1109/TSA.2003.814377","DOIUrl":"https://doi.org/10.1109/TSA.2003.814377","url":null,"abstract":"A new duration modeling approach for Mandarin speech is proposed. It explicitly takes several major affecting factors, such as multiplicative companding factors (CFs), and estimates all model parameters by an EM algorithm. The three basic Tone 3 patterns (i.e., full tone, half tone and sandhi tone) are also properly considered using three different CFs to separate how they affect syllable duration. Experimental results show that the variance of the syllable duration is greatly reduced from 180.17 to 2.52 frame/sup 2/ (1 frame = 5 ms) by the syllable duration modeling to eliminate effects from those affecting factors. Moreover, the estimated CFs of those affecting factors agree well with our prior linguistic knowledge. Two extensions of the duration modeling method are also performed. One is the use of the same technique to model initial and final durations. The other is to replace the multiplicative model with an additive one. Lastly, a preliminary study of applying the proposed model to predict syllable duration for TTS (text-to-speech) is also performed. Experimental results show that it outperforms the conventional regressive prediction method.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"74 1","pages":"308-320"},"PeriodicalIF":0.0,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83361525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
High-fidelity multichannel audio coding with Karhunen-Loeve transform 高保真多声道音频编码与Karhunen-Loeve变换
IEEE Trans. Speech Audio Process. Pub Date : 2003-07-28 DOI: 10.1109/TSA.2003.814375
Dai Yang, H. Ai, C. Kyriakakis, C.-C. Jay Kuo
{"title":"High-fidelity multichannel audio coding with Karhunen-Loeve transform","authors":"Dai Yang, H. Ai, C. Kyriakakis, C.-C. Jay Kuo","doi":"10.1109/TSA.2003.814375","DOIUrl":"https://doi.org/10.1109/TSA.2003.814375","url":null,"abstract":"A new quality-scalable high-fidelity multichannel audio compression algorithm based on MPEG-2 advanced audio coding (AAC) is presented. The Karhunen-Loeve transform (KLT) is applied to multichannel audio signals in the preprocessing stage to remove interchannel redundancy. Then, signals in decorrelated channels are compressed by a modified AAC main profile encoder. Finally, a channel transmission control mechanism is used to re-organize the bitstream so that the multichannel audio bitstream has a quality scalable property when it is transmitted over a heterogeneous network. Experimental results show that, compared with AAC, the proposed algorithm achieves a better performance while maintaining a similar computational complexity at the regular bit rate of 64 kbit/sec/ch. When the bitstream is transmitted to narrowband end users at a lower bit rate, packets in some channels can be dropped, and slightly degraded, yet full-channel, audio can still be reconstructed in a reasonable fashion without any additional computational cost.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"29 1","pages":"365-380"},"PeriodicalIF":0.0,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77844797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Perceptual phase quantization of speech 语音的感知相位量化
IEEE Trans. Speech Audio Process. Pub Date : 2003-07-28 DOI: 10.1109/TSA.2003.814409
Doh-Suk Kim
{"title":"Perceptual phase quantization of speech","authors":"Doh-Suk Kim","doi":"10.1109/TSA.2003.814409","DOIUrl":"https://doi.org/10.1109/TSA.2003.814409","url":null,"abstract":"It is essential to incorporate perceptual characteristics of human hearing in modern speech/audio coding systems. However, the focus has been confined only to the magnitude information of speech, and little attention has been paid to phase information. A quantitative study on the characteristics of human phase perception is presented and a novel method is proposed for the quantization of phase information in speech/audio signals. First, the just-noticeable difference (JND) of phase for each harmonic in flat-spectrum periodic tones is measured for several different fundamental frequencies. Then, a mathematical model of JND is established, based on measured data, to form a weighting function for phase quantization. Since the proposed weighting function is derived from psychoacoustic measurements, it provides a novel quantization method by which more bits are assigned to perceptually important phase components at the sacrifice of less important ones, resulting in a quantized signal perceptually closer to the original one. Experimental results on five vowel speech signals demonstrate that the proposed weighting function is very effective for the quantization of phase information.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"3 1","pages":"355-364"},"PeriodicalIF":0.0,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84935583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
A generalized subspace approach for enhancing speech corrupted by colored noise 一种增强有色噪声语音的广义子空间方法
IEEE Trans. Speech Audio Process. Pub Date : 2003-07-28 DOI: 10.1109/TSA.2003.814458
Y. Hu, P. Loizou
{"title":"A generalized subspace approach for enhancing speech corrupted by colored noise","authors":"Y. Hu, P. Loizou","doi":"10.1109/TSA.2003.814458","DOIUrl":"https://doi.org/10.1109/TSA.2003.814458","url":null,"abstract":"A generalized subspace approach is proposed for enhancement of speech corrupted by colored noise. A nonunitary transform, based on the simultaneous diagonalization of the clean speech and noise covariance matrices, is used to project the noisy signal onto a signal-plus-noise subspace and a noise subspace. The clean signal is estimated by nulling the signal components in the noise subspace and retaining the components in the signal subspace. The applied transform has built-in prewhitening and can therefore be used in general for colored noise. The proposed approach is shown to be a generalization of the approach proposed by Y. Ephraim and H.L. Van Trees (see ibid., vol.3, p.251-66, 1995) for white noise. Two estimators are derived based on the nonunitary transform, one based on time-domain constraints and one based on spectral domain constraints. Objective and subjective measures demonstrate improvements over other subspace-based methods when tested with TIMIT sentences corrupted with speech-shaped noise and multi-talker babble.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"46 1","pages":"334-341"},"PeriodicalIF":0.0,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86859736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 406
Joint filterbanks for echo cancellation and audio coding 联合滤波器组回声消除和音频编码
IEEE Trans. Speech Audio Process. Pub Date : 2003-07-28 DOI: 10.1109/TSA.2003.814798
P. Eneroth
{"title":"Joint filterbanks for echo cancellation and audio coding","authors":"P. Eneroth","doi":"10.1109/TSA.2003.814798","DOIUrl":"https://doi.org/10.1109/TSA.2003.814798","url":null,"abstract":"Joint structures for audio coding and echo cancellation are investigated, utilizing standard audio coders. Two types of audio coders are considered, coders based on cosine modulated filterbanks and coders based on the modified discrete cosine transform (MDCT). For the first coder type, two methods for combining such a coder with a subband echo canceller are proposed. The two methods are: a modified audio coder filterbank that is suitable for echo cancellation but still generates the same final decomposition as the standard audio coder filterbank, and another that converts subband signals between an audio coder filterbank and a filterbank designed for echo cancellation. For the MDCT based audio coder, a joint structure with a frequency-domain adaptive filter based echo canceller is considered. Computational complexity and transmission delay for the different coder/echo canceller combinations are presented. Convergence properties of the proposed echo canceller structures are shown using simulations with real-life recorded speech.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"45 1","pages":"342-354"},"PeriodicalIF":0.0,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75097897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信