IEEE Trans. Speech Audio Process.最新文献

筛选
英文 中文
Multigrained modeling with pattern specific maximum likelihood transformations for text-independent speaker recognition 具有模式特定的最大似然变换的多粒度建模,用于文本无关的说话人识别
IEEE Trans. Speech Audio Process. Pub Date : 2003-02-19 DOI: 10.1109/TSA.2003.809121
U. Chaudhari, Jirí Navrátil, Stephane H Maes
{"title":"Multigrained modeling with pattern specific maximum likelihood transformations for text-independent speaker recognition","authors":"U. Chaudhari, Jirí Navrátil, Stephane H Maes","doi":"10.1109/TSA.2003.809121","DOIUrl":"https://doi.org/10.1109/TSA.2003.809121","url":null,"abstract":"We present a transformation-based, multigrained data modeling technique in the context of text independent speaker recognition, aimed at mitigating difficulties caused by sparse training and test data. Both identification and verification are addressed, where we view the entire population as divided into the target population and its complement, which we refer to as the background population. First, we present our development of maximum likelihood transformation based recognition with diagonally constrained Gaussian mixture models and show its robustness to data scarcity with results on identification. Then for each target and background speaker, a multigrained model is constructed using the transformation based extension as a building block. The training data is labeled with an HMM based phone labeler. We then make use of a graduated phone class structure to train the speaker model at various levels of detail. This structure is a tree with the root node containing all the phones. Subsequent levels partition the phones into increasingly finer grained linguistic classes. This method affords the use of fine detail where possible, i.e., as reflected in the amount of training data distributed to each tree node. We demonstrate the effectiveness of the modeling with verification experiments in matched and mismatched conditions.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"10 1","pages":"61-69"},"PeriodicalIF":0.0,"publicationDate":"2003-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78707140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Multichannel affine and fast affine projection algorithms for active noise control and acoustic equalization systems 用于主动噪声控制和声学均衡系统的多通道仿射和快速仿射投影算法
IEEE Trans. Speech Audio Process. Pub Date : 2003-02-19 DOI: 10.1109/TSA.2002.805642
M. Bouchard
{"title":"Multichannel affine and fast affine projection algorithms for active noise control and acoustic equalization systems","authors":"M. Bouchard","doi":"10.1109/TSA.2002.805642","DOIUrl":"https://doi.org/10.1109/TSA.2002.805642","url":null,"abstract":"In the field of adaptive signal processing, it is well known that affine projection algorithms or their low-computational implementations fast affine projection algorithms can produce a good tradeoff between convergence speed and computational complexity. Although these algorithms typically do not provide the same convergence speed as recursive-least-squares algorithms, they can provide a much improved convergence speed compared to stochastic gradient descent algorithms, without the high increase of the computational load or the instability often found in recursive-least-squares algorithms. In this paper, multichannel affine and fast affine projection algorithms are introduced for active noise control or acoustic equalization. Multichannel fast affine projection algorithms have been previously published for acoustic echo cancellation, but the problem of active noise control or acoustic equalization is a very different one, leading to different structures, as explained in the paper. The computational complexity of the new algorithms is evaluated, and it is shown through simulations that not only can the new algorithms provide the expected tradeoff between convergence performance and computational complexity, they can also provide the best convergence performance (even over recursive-least-squares algorithms) when nonideal noisy acoustic plant models are used in the adaptive systems.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"67 1","pages":"54-60"},"PeriodicalIF":0.0,"publicationDate":"2003-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74812819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 127
Bounded support Gaussian mixture modeling of speech spectra 语音频谱的有界支持高斯混合建模
IEEE Trans. Speech Audio Process. Pub Date : 2003-02-19 DOI: 10.1109/TSA.2002.805639
J. Lindblom, J. Samuelsson
{"title":"Bounded support Gaussian mixture modeling of speech spectra","authors":"J. Lindblom, J. Samuelsson","doi":"10.1109/TSA.2002.805639","DOIUrl":"https://doi.org/10.1109/TSA.2002.805639","url":null,"abstract":"Lately, Gaussian mixture (GM) models have found new applications in speech processing, and particularly in speech coding. This paper provides a review of GM based quantization and prediction. The main contribution is a discussion on GM model optimization. Two previously presented algorithms of EM-type are analyzed in some detail, and models are estimated and evaluated experimentally using theoretical measures as well as GM based speech spectrum coding and prediction. It has been argued that since many sources have a bounded support, this should be utilized in both the choice of model, and the optimization algorithm. By low-dimensional modeling examples, illustrating the behavior of the two algorithms graphically, and by full-scale evaluation of GM based systems, the advantages of a bounded support approach are quantified. For all evaluation techniques in the study, model accuracy is improved when the bounded support approach is adopted. The gains are typically largest for models with diagonal covariance matrices.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"9 1","pages":"88-99"},"PeriodicalIF":0.0,"publicationDate":"2003-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81506759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 66
From the editor-in-chief 来自总编辑
IEEE Trans. Speech Audio Process. Pub Date : 2003-01-01 DOI: 10.1109/TSA.2003.815277
I. Trancoso
{"title":"From the editor-in-chief","authors":"I. Trancoso","doi":"10.1109/TSA.2003.815277","DOIUrl":"https://doi.org/10.1109/TSA.2003.815277","url":null,"abstract":"","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"17 1","pages":"297"},"PeriodicalIF":0.0,"publicationDate":"2003-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72658562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stereophonic acoustic echo cancellation using lattice orthogonalization 利用点阵正交法消除立体声回声
IEEE Trans. Speech Audio Process. Pub Date : 2002-12-10 DOI: 10.1109/TSA.2002.804537
K. Mayyas
{"title":"Stereophonic acoustic echo cancellation using lattice orthogonalization","authors":"K. Mayyas","doi":"10.1109/TSA.2002.804537","DOIUrl":"https://doi.org/10.1109/TSA.2002.804537","url":null,"abstract":"Stereophonic teleconferencing provides more natural acoustic perception by virtue of its enhanced sound localization. Of paramount importance is stereo acoustic echo cancellation (SAEC) that poses a difficult challenge to low complexity adaptive algorithms to achieve acceptable AEC due, mainly, to the strong cross-correlation between the two-channel input signals. This paper proposes a transform domain two-channel lattice algorithm that inherently decorrelates the stereo signals. The algorithm, however, bears a high computational complexity for large filter orders, N. A low complexity O(4N) algorithm is developed based on employing the functionality of the two-channel lattice cell in the previous algorithm in a weighted subband scheme. The algorithm is capable of producing complete orthogonal subbands of the stereo signals, and also allows for a tradeoff between performance and complexity. The performance of the proposed algorithms is compared with other existing algorithms via simulations and using actual teleconferencing room impulse responses.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"53 1","pages":"517-525"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80940389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Application of time-frequency principal component analysis to text-independent speaker identification 时频主成分分析在文本无关说话人识别中的应用
IEEE Trans. Speech Audio Process. Pub Date : 2002-12-10 DOI: 10.1109/TSA.2002.800557
I. Magrin-Chagnolleau, G. Durou, F. Bimbot
{"title":"Application of time-frequency principal component analysis to text-independent speaker identification","authors":"I. Magrin-Chagnolleau, G. Durou, F. Bimbot","doi":"10.1109/TSA.2002.800557","DOIUrl":"https://doi.org/10.1109/TSA.2002.800557","url":null,"abstract":"We propose a formalism, called vector filtering of spectral trajectories, that allows the integration of a number of speech parameterization approaches (cepstral analysis, /spl Delta/ and /spl Delta//spl Delta/ parameterizations, auto-regressive vector modeling, ...) under a common formalism. We then propose a new filtering, called contextual principal components (CPC) or time-frequency principal components (TFPC). This filtering consists in extracting the principal components of the contextual covariance matrix, which is the covariance matrix of a sequence of vectors expanded by their context. We apply this new filtering in the framework of closed-set speaker identification, using a subset of the POLYCOST database. When using speaker-dependent TFPC filters, our results show a relative improvement of approximately 20% compared to the use of the classical cepstral coefficients augmented by their /spl Delta/-coefficients, which is significantly better with a 90% confidence level.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"206 1","pages":"371-378"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77793274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Improved audio coding using a psychoacoustic model based on a cochlear filter bank 基于耳蜗滤波器库的心理声学模型改进音频编码
IEEE Trans. Speech Audio Process. Pub Date : 2002-12-10 DOI: 10.1109/TSA.2002.804536
F. Baumgarte
{"title":"Improved audio coding using a psychoacoustic model based on a cochlear filter bank","authors":"F. Baumgarte","doi":"10.1109/TSA.2002.804536","DOIUrl":"https://doi.org/10.1109/TSA.2002.804536","url":null,"abstract":"Perceptual audio coders use an estimated masked threshold for the determination of the maximum permissible just-inaudible noise level introduced by quantization. This estimate is derived from a psychoacoustic model mimicking the properties of. masking. Most psychoacoustic models for coding applications use a uniform (equal bandwidth) spectral decomposition as a first step to approximate the frequency selectivity of the human auditory system. However, the equal filter properties of the uniform subbands do not match the nonuniform characteristics of cochlear filters and reduce the precision of psychoacoustic modeling. Even so, uniform filter banks are applied because they are computationally efficient. This paper presents a psychoacoustic model based on an efficient nonuniform cochlear filter bank and a simple masked threshold estimation. The novel filter-bank structure employs cascaded low-order IIR filters and appropriate down-sampling to increase efficiency. The filter responses are optimized for the modeling of auditory masking effects. Results of the new psychoacoustic model applied to audio coding show better performance in terms of bit rate and/or quality of the new model in comparison with other state-of-the-art models using a uniform spectral decomposition. The low delay of the new model is particularly suitable for low-delay coders.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"1 1","pages":"495-503"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88908245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Text-independent speaker verification using utterance level scoring and covariance modeling 使用话语水平评分和协方差建模的文本独立说话人验证
IEEE Trans. Speech Audio Process. Pub Date : 2002-12-10 DOI: 10.1109/TSA.2002.803419
Ran D. Zilca
{"title":"Text-independent speaker verification using utterance level scoring and covariance modeling","authors":"Ran D. Zilca","doi":"10.1109/TSA.2002.803419","DOIUrl":"https://doi.org/10.1109/TSA.2002.803419","url":null,"abstract":"This paper describes a computationally simple method to perform text independent speaker verification using second order statistics. The suggested method, called utterance level scoring (ULS), allows one to obtain a normalized score using a single pass through the frames of the tested utterance. The utterance sample covariance is first calculated and then compared to the speaker covariance using a distortion measure. Subsequently, a distortion measure between the utterance covariance and the sample covariance of data taken from different speakers is used to normalize the score. Experimental results from the 2000 NIST speaker recognition evaluation are presented for ULS, used with different distortion measures, and for a Gaussian mixture model (GMM) system. The results indicate that ULS as a viable alternative to GMM whenever the computational complexity and verification accuracy needs to be traded.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"25 1","pages":"363-370"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81386012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Perceptual audio coding using adaptive pre- and post-filters and lossless compression 使用自适应预滤波器和后滤波器和无损压缩的感知音频编码
IEEE Trans. Speech Audio Process. Pub Date : 2002-12-10 DOI: 10.1109/TSA.2002.803444
G. Schuller, Bin Yu, Dawei Huang, B. Edler
{"title":"Perceptual audio coding using adaptive pre- and post-filters and lossless compression","authors":"G. Schuller, Bin Yu, Dawei Huang, B. Edler","doi":"10.1109/TSA.2002.803444","DOIUrl":"https://doi.org/10.1109/TSA.2002.803444","url":null,"abstract":"This paper proposes a versatile perceptual audio coding method that achieves high compression ratios and is capable of low encoding/decoding delay. It accommodates a variety of source signals (including both music and speech) with different sampling rates. It is based on separating irrelevance and redundancy reductions into independent functional units. This contrasts traditional audio coding where both are integrated within the same subband decomposition. The separation allows for the independent optimization of the irrelevance and redundancy reduction units. For both reductions, we rely on adaptive filtering and predictive coding as much as possible to minimize the delay. A psycho-acoustically controlled adaptive linear filter is used for the irrelevance reduction, and the redundancy reduction is carried out by a predictive lossless coding scheme, which is termed weighted cascaded least mean squared (WCLMS) method. Experiments are carried out on a database of moderate size which contains mono-signals of different sampling rates and varying nature (music, speech, or mixed). They show that the proposed WCLMS lossless coder outperforms other competing lossless coders in terms of compression ratios and delay, as applied to the pre-filtered signal. Moreover, a subjective listening test of the combined pre-filter/lossless coder and a state-of-the-art perceptual audio coder (PAC) shows that the new method achieves a comparable compression ratio and audio quality with a lower delay.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"20 1","pages":"379-390"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80152641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 75
Robust speech recognition using probabilistic union models 基于概率联合模型的鲁棒语音识别
IEEE Trans. Speech Audio Process. Pub Date : 2002-12-10 DOI: 10.1109/TSA.2002.803439
J. Ming, P. Jančovič, F. J. Smith
{"title":"Robust speech recognition using probabilistic union models","authors":"J. Ming, P. Jančovič, F. J. Smith","doi":"10.1109/TSA.2002.803439","DOIUrl":"https://doi.org/10.1109/TSA.2002.803439","url":null,"abstract":"This paper introduces a new statistical approach, namely the probabilistic union model, for speech recognition involving partial, unknown frequency-band corruption. Partial frequency-band corruption accounts for the effect of a family of real-world noises. Previous methods based on the missing feature theory usually require the identity of the noisy bands. This identification can be difficult for unexpected noise with unknown, time-varying band characteristics. The new model combines the local frequency-band information based on the union of random events, to reduce the dependence of the model on information about the noise. This model partially accomplishes the target: offering robustness to partial frequency-band corruption, while requiring no information about the noise. This paper introduces the theory and implementation of the union model, and is focused on several important advances. These new developments include a new algorithm for automatic order selection, a generalization of the modeling principle to accommodate partial feature stream corruption, and a combination of the union model with conventional noise reduction techniques to deal with a mixture of stationary noise and unknown, nonstationary noise. For the evaluation, we used the TIDIGITS database for speaker-independent connected digit recognition. The utterances were corrupted by various types of additive noise, stationary or time-varying, assuming no knowledge about the noise characteristics. The results indicate that the new model offers significantly improved robustness in comparison to other models.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"64 1","pages":"403-414"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73659045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信