IEEE Trans. Speech Audio Process.最新文献_第10页

Multigrained modeling with pattern specific maximum likelihood transformations for text-independent speaker recognition 具有模式特定的最大似然变换的多粒度建模，用于文本无关的说话人识别

IEEE Trans. Speech Audio Process. Pub Date : 2003-02-19 DOI: 10.1109/TSA.2003.809121

U. Chaudhari, Jirí Navrátil, Stephane H Maes

{"title":"Multigrained modeling with pattern specific maximum likelihood transformations for text-independent speaker recognition","authors":"U. Chaudhari, Jirí Navrátil, Stephane H Maes","doi":"10.1109/TSA.2003.809121","DOIUrl":"https://doi.org/10.1109/TSA.2003.809121","url":null,"abstract":"We present a transformation-based, multigrained data modeling technique in the context of text independent speaker recognition, aimed at mitigating difficulties caused by sparse training and test data. Both identification and verification are addressed, where we view the entire population as divided into the target population and its complement, which we refer to as the background population. First, we present our development of maximum likelihood transformation based recognition with diagonally constrained Gaussian mixture models and show its robustness to data scarcity with results on identification. Then for each target and background speaker, a multigrained model is constructed using the transformation based extension as a building block. The training data is labeled with an HMM based phone labeler. We then make use of a graduated phone class structure to train the speaker model at various levels of detail. This structure is a tree with the root node containing all the phones. Subsequent levels partition the phones into increasingly finer grained linguistic classes. This method affords the use of fine detail where possible, i.e., as reflected in the amount of training data distributed to each tree node. We demonstrate the effectiveness of the modeling with verification experiments in matched and mismatched conditions.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"10 1","pages":"61-69"},"PeriodicalIF":0.0,"publicationDate":"2003-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78707140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

Multichannel affine and fast affine projection algorithms for active noise control and acoustic equalization systems 用于主动噪声控制和声学均衡系统的多通道仿射和快速仿射投影算法

IEEE Trans. Speech Audio Process. Pub Date : 2003-02-19 DOI: 10.1109/TSA.2002.805642

M. Bouchard

{"title":"Multichannel affine and fast affine projection algorithms for active noise control and acoustic equalization systems","authors":"M. Bouchard","doi":"10.1109/TSA.2002.805642","DOIUrl":"https://doi.org/10.1109/TSA.2002.805642","url":null,"abstract":"In the field of adaptive signal processing, it is well known that affine projection algorithms or their low-computational implementations fast affine projection algorithms can produce a good tradeoff between convergence speed and computational complexity. Although these algorithms typically do not provide the same convergence speed as recursive-least-squares algorithms, they can provide a much improved convergence speed compared to stochastic gradient descent algorithms, without the high increase of the computational load or the instability often found in recursive-least-squares algorithms. In this paper, multichannel affine and fast affine projection algorithms are introduced for active noise control or acoustic equalization. Multichannel fast affine projection algorithms have been previously published for acoustic echo cancellation, but the problem of active noise control or acoustic equalization is a very different one, leading to different structures, as explained in the paper. The computational complexity of the new algorithms is evaluated, and it is shown through simulations that not only can the new algorithms provide the expected tradeoff between convergence performance and computational complexity, they can also provide the best convergence performance (even over recursive-least-squares algorithms) when nonideal noisy acoustic plant models are used in the adaptive systems.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"67 1","pages":"54-60"},"PeriodicalIF":0.0,"publicationDate":"2003-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74812819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 127

Bounded support Gaussian mixture modeling of speech spectra 语音频谱的有界支持高斯混合建模

IEEE Trans. Speech Audio Process. Pub Date : 2003-02-19 DOI: 10.1109/TSA.2002.805639

J. Lindblom, J. Samuelsson

引用次数: 66

From the editor-in-chief 来自总编辑

IEEE Trans. Speech Audio Process. Pub Date : 2003-01-01 DOI: 10.1109/TSA.2003.815277

I. Trancoso

引用次数: 0

Stereophonic acoustic echo cancellation using lattice orthogonalization 利用点阵正交法消除立体声回声

IEEE Trans. Speech Audio Process. Pub Date : 2002-12-10 DOI: 10.1109/TSA.2002.804537

K. Mayyas

引用次数: 18

Application of time-frequency principal component analysis to text-independent speaker identification 时频主成分分析在文本无关说话人识别中的应用

IEEE Trans. Speech Audio Process. Pub Date : 2002-12-10 DOI: 10.1109/TSA.2002.800557

I. Magrin-Chagnolleau, G. Durou, F. Bimbot

引用次数: 24

Improved audio coding using a psychoacoustic model based on a cochlear filter bank 基于耳蜗滤波器库的心理声学模型改进音频编码

IEEE Trans. Speech Audio Process. Pub Date : 2002-12-10 DOI: 10.1109/TSA.2002.804536

F. Baumgarte

{"title":"Improved audio coding using a psychoacoustic model based on a cochlear filter bank","authors":"F. Baumgarte","doi":"10.1109/TSA.2002.804536","DOIUrl":"https://doi.org/10.1109/TSA.2002.804536","url":null,"abstract":"Perceptual audio coders use an estimated masked threshold for the determination of the maximum permissible just-inaudible noise level introduced by quantization. This estimate is derived from a psychoacoustic model mimicking the properties of. masking. Most psychoacoustic models for coding applications use a uniform (equal bandwidth) spectral decomposition as a first step to approximate the frequency selectivity of the human auditory system. However, the equal filter properties of the uniform subbands do not match the nonuniform characteristics of cochlear filters and reduce the precision of psychoacoustic modeling. Even so, uniform filter banks are applied because they are computationally efficient. This paper presents a psychoacoustic model based on an efficient nonuniform cochlear filter bank and a simple masked threshold estimation. The novel filter-bank structure employs cascaded low-order IIR filters and appropriate down-sampling to increase efficiency. The filter responses are optimized for the modeling of auditory masking effects. Results of the new psychoacoustic model applied to audio coding show better performance in terms of bit rate and/or quality of the new model in comparison with other state-of-the-art models using a uniform spectral decomposition. The low delay of the new model is particularly suitable for low-delay coders.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"1 1","pages":"495-503"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88908245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

Text-independent speaker verification using utterance level scoring and covariance modeling 使用话语水平评分和协方差建模的文本独立说话人验证

IEEE Trans. Speech Audio Process. Pub Date : 2002-12-10 DOI: 10.1109/TSA.2002.803419

Ran D. Zilca

引用次数: 14

Perceptual audio coding using adaptive pre- and post-filters and lossless compression 使用自适应预滤波器和后滤波器和无损压缩的感知音频编码

IEEE Trans. Speech Audio Process. Pub Date : 2002-12-10 DOI: 10.1109/TSA.2002.803444

G. Schuller, Bin Yu, Dawei Huang, B. Edler

{"title":"Perceptual audio coding using adaptive pre- and post-filters and lossless compression","authors":"G. Schuller, Bin Yu, Dawei Huang, B. Edler","doi":"10.1109/TSA.2002.803444","DOIUrl":"https://doi.org/10.1109/TSA.2002.803444","url":null,"abstract":"This paper proposes a versatile perceptual audio coding method that achieves high compression ratios and is capable of low encoding/decoding delay. It accommodates a variety of source signals (including both music and speech) with different sampling rates. It is based on separating irrelevance and redundancy reductions into independent functional units. This contrasts traditional audio coding where both are integrated within the same subband decomposition. The separation allows for the independent optimization of the irrelevance and redundancy reduction units. For both reductions, we rely on adaptive filtering and predictive coding as much as possible to minimize the delay. A psycho-acoustically controlled adaptive linear filter is used for the irrelevance reduction, and the redundancy reduction is carried out by a predictive lossless coding scheme, which is termed weighted cascaded least mean squared (WCLMS) method. Experiments are carried out on a database of moderate size which contains mono-signals of different sampling rates and varying nature (music, speech, or mixed). They show that the proposed WCLMS lossless coder outperforms other competing lossless coders in terms of compression ratios and delay, as applied to the pre-filtered signal. Moreover, a subjective listening test of the combined pre-filter/lossless coder and a state-of-the-art perceptual audio coder (PAC) shows that the new method achieves a comparable compression ratio and audio quality with a lower delay.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"20 1","pages":"379-390"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80152641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 75

Robust speech recognition using probabilistic union models 基于概率联合模型的鲁棒语音识别

IEEE Trans. Speech Audio Process. Pub Date : 2002-12-10 DOI: 10.1109/TSA.2002.803439

J. Ming, P. Jančovič, F. J. Smith

{"title":"Robust speech recognition using probabilistic union models","authors":"J. Ming, P. Jančovič, F. J. Smith","doi":"10.1109/TSA.2002.803439","DOIUrl":"https://doi.org/10.1109/TSA.2002.803439","url":null,"abstract":"This paper introduces a new statistical approach, namely the probabilistic union model, for speech recognition involving partial, unknown frequency-band corruption. Partial frequency-band corruption accounts for the effect of a family of real-world noises. Previous methods based on the missing feature theory usually require the identity of the noisy bands. This identification can be difficult for unexpected noise with unknown, time-varying band characteristics. The new model combines the local frequency-band information based on the union of random events, to reduce the dependence of the model on information about the noise. This model partially accomplishes the target: offering robustness to partial frequency-band corruption, while requiring no information about the noise. This paper introduces the theory and implementation of the union model, and is focused on several important advances. These new developments include a new algorithm for automatic order selection, a generalization of the modeling principle to accommodate partial feature stream corruption, and a combination of the union model with conventional noise reduction techniques to deal with a mixture of stationary noise and unknown, nonstationary noise. For the evaluation, we used the TIDIGITS database for speaker-independent connected digit recognition. The utterances were corrupted by various types of additive noise, stationary or time-varying, assuming no knowledge about the noise characteristics. The results indicate that the new model offers significantly improved robustness in comparison to other models.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"64 1","pages":"403-414"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73659045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 46