IEEE Trans. Speech Audio Process.最新文献

筛选
英文 中文
The multimode transform predictive coding paradigm 多模变换预测编码范式
IEEE Trans. Speech Audio Process. Pub Date : 2003-04-15 DOI: 10.1109/TSA.2003.809195
S. Ramprashad
{"title":"The multimode transform predictive coding paradigm","authors":"S. Ramprashad","doi":"10.1109/TSA.2003.809195","DOIUrl":"https://doi.org/10.1109/TSA.2003.809195","url":null,"abstract":"Presented is a new coding paradigm, multimode transform predictive coding (MTPC), which combines speech and audio coding principles in a single coding structure. The paradigm is an adaptive coding paradigm which automatically adjusts how different coding modules are used based on the input signal. This allows MTPC coders to robustly handle a wider range of signals than single configuration (mode) transform predictive coding (TPC) designs. A wideband MTPC coder design targeting two-way communication applications and bitrates from 13 to 40 kbit/s is also presented. Subjective absolute category rating test results on speech, speech in noise and music demonstrate that the performance at 16, 24 and 32 kbit/s meets or exceeds that of ITU-T Rec. G.722 at 48, 56 and 64 kbit/s respectively for many coding conditions. Subjective Reference-ABx (R-ABx) tests are also included to show the potential advantages of the multimode coder over a single mode TPC coder. Finally, possible improvements in the MTPC coder design for applications such as broadcasting, which are less sensitive to delay and encoder complexity, are discussed.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"69 1","pages":"117-129"},"PeriodicalIF":0.0,"publicationDate":"2003-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80283918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Iterated partitioned block frequency-domain adaptive filtering for acoustic echo cancellation 声学回声消除的迭代分块频域自适应滤波
IEEE Trans. Speech Audio Process. Pub Date : 2003-04-15 DOI: 10.1109/TSA.2003.809194
K. Eneman, M. Moonen
{"title":"Iterated partitioned block frequency-domain adaptive filtering for acoustic echo cancellation","authors":"K. Eneman, M. Moonen","doi":"10.1109/TSA.2003.809194","DOIUrl":"https://doi.org/10.1109/TSA.2003.809194","url":null,"abstract":"For high quality acoustic echo cancellation long echoes have to be suppressed. classical LMS-based adaptive filters are not attractive as they are suboptimal from a computational point of view. Multirate adaptive filters such as the partitioned block frequency-domain adaptive filter (PBFDAF) are good alternatives and are widely used in commercial echo cancellers nowadays. In this paper the PBFDRAP is analyzed, which combines frequency-domain adaptive filtering with so-called \"row action projection.\" Fast versions of the algorithm are derived and it is shown that the PBFDRAP outperforms the PBFDAF in a realistic echo cancellation setup.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"9 1","pages":"143-158"},"PeriodicalIF":0.0,"publicationDate":"2003-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79552227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Optimizing feature extraction for speech recognition 优化语音识别特征提取
IEEE Trans. Speech Audio Process. Pub Date : 2003-02-19 DOI: 10.1109/TSA.2002.805644
Chulhee Lee, Donghoon Hyun, E. Choi, Jinwook Go, Chungyong Lee
{"title":"Optimizing feature extraction for speech recognition","authors":"Chulhee Lee, Donghoon Hyun, E. Choi, Jinwook Go, Chungyong Lee","doi":"10.1109/TSA.2002.805644","DOIUrl":"https://doi.org/10.1109/TSA.2002.805644","url":null,"abstract":"We propose a method to minimize the loss of information during the feature extraction stage in speech recognition by optimizing the parameters of the mel-cepstrum transformation, a transform which is widely used in speech recognition. Typically, the mel-cepstrum is obtained by critical band filters whose characteristics play an important role in converting a speech signal into a sequence of vectors. First, we analyze the performance of the mel-cepstrum by changing the parameters of the filters such as shape, center frequency, and bandwidth. Then we propose an algorithm to optimize the parameters of the filters using the simplex method. Experiments with Korean digit words show that the recognition rate improved by about 4-7%.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"45 1","pages":"80-87"},"PeriodicalIF":0.0,"publicationDate":"2003-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86923435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
A formant filtered physical model for wind instruments 管乐器的形成峰过滤物理模型
IEEE Trans. Speech Audio Process. Pub Date : 2003-02-19 DOI: 10.1109/TSA.2002.807351
A. Nackaerts, B. Moor, R. Lauwereins
{"title":"A formant filtered physical model for wind instruments","authors":"A. Nackaerts, B. Moor, R. Lauwereins","doi":"10.1109/TSA.2002.807351","DOIUrl":"https://doi.org/10.1109/TSA.2002.807351","url":null,"abstract":"We report on our research concerning the calibration of physical models for sound synthesis. We combine waveguide physical modeling synthesis with formant filtering, by dividing the nonlinear description of the reed mechanism into a nonlinear part and an input-dependent linear filter. We elaborate on the calibration of the model and assess its performance by comparing it to a single-reed, cylindrical bore instrument, the clarinet.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"12 1","pages":"36-44"},"PeriodicalIF":0.0,"publicationDate":"2003-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74452203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A robust online secondary path modeling method with auxiliary noise power scheduling strategy and norm constraint manipulation 一种具有辅助噪声功率调度策略和范数约束的鲁棒在线二次路径建模方法
IEEE Trans. Speech Audio Process. Pub Date : 2003-02-19 DOI: 10.1109/TSA.2003.805643
Ming Zhang, H. Lan, W. Ser
{"title":"A robust online secondary path modeling method with auxiliary noise power scheduling strategy and norm constraint manipulation","authors":"Ming Zhang, H. Lan, W. Ser","doi":"10.1109/TSA.2003.805643","DOIUrl":"https://doi.org/10.1109/TSA.2003.805643","url":null,"abstract":"In many practical cases for active noise control (ANC), the online secondary path modeling methods that use auxiliary noise are often applied. However, the auxiliary noise contributes to residual noise, and thus deteriorates the noise control performance of ANC systems. Moreover, a sudden and large change in the secondary path leads to easy divergence of the existing online secondary path modeling methods. To mitigate these problems, this paper proposes a new online secondary path modeling method with auxiliary noise power scheduling and adaptive filter norm manipulation. The auxiliary noise power is scheduled based on the convergence status of an ANC system with consideration of the variation of the primary noise. The purpose is to alleviate the increment of the residual noise due to the auxiliary noise. In addition, the norm manipulation is applied to adaptive filters in the ANC system. The objective is to avoid over-updates of adaptive filters due to the sudden large change in the secondary path and thus prevent the ANC system from diverging. Computer simulations show the effectiveness and robustness of the proposed method.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"88 1","pages":"45-53"},"PeriodicalIF":0.0,"publicationDate":"2003-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81244473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 86
Noise reduction and echo cancellation front-end for speech codecs 语音编解码器的降噪和回声消除前端
IEEE Trans. Speech Audio Process. Pub Date : 2003-02-19 DOI: 10.1109/TSA.2002.807350
F. Basbug, K. Swaminathan, S. Nandkumar
{"title":"Noise reduction and echo cancellation front-end for speech codecs","authors":"F. Basbug, K. Swaminathan, S. Nandkumar","doi":"10.1109/TSA.2002.807350","DOIUrl":"https://doi.org/10.1109/TSA.2002.807350","url":null,"abstract":"We present an enhancement front-end for speech codecs, which consists of the integrated elements of noise reduction and echo cancellation. By including these elements, the front-end performs the task of mitigating the objectionable effects of the two major factors, i.e., noise and echo, which adversely affect the quality of most transmission systems, especially when low bit rate codecs are used. The use of this front-end is demonstrated with the 7.4 kbps IS-641 codec (enhanced full-rate standard for IS-136 systems). The integrated speech-processing unit has the advantage of utilizing the synergy among its components: the voice activity detector in the speech codec, the noise reduction, and the echo canceller. This synergy manifests itself both in the form of a reduction of the overall computational complexity by the use of a number of shared elements among the unit's various components, as well as an improved performance resulting from these components working together. The system displays high performance in both clean and noisy environments and it works well with low bit rate codecs.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"178 1","pages":"1-13"},"PeriodicalIF":0.0,"publicationDate":"2003-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79964723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
On the computation of the Kullback-Leibler measure for spectral distances 光谱距离的Kullback-Leibler测度的计算
IEEE Trans. Speech Audio Process. Pub Date : 2003-02-19 DOI: 10.1109/TSA.2002.805641
R. Veldhuis, E. Klabbers
{"title":"On the computation of the Kullback-Leibler measure for spectral distances","authors":"R. Veldhuis, E. Klabbers","doi":"10.1109/TSA.2002.805641","DOIUrl":"https://doi.org/10.1109/TSA.2002.805641","url":null,"abstract":"Efficient algorithms for the exact and approximate computation of the symmetrical Kullback-Leibler (1998) measure for spectral distances are presented for linear predictive coding (LPC) spectra. A interpretation of this measure is given in terms of the poles of the spectra. The performances of the algorithms in terms of accuracy and computational complexity are assessed for the application of computing concatenation costs in unit-selection-based speech synthesis. With the same complexity and storage requirements, the exact method is superior in terms of accuracy.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"53 39 1","pages":"100-103"},"PeriodicalIF":0.0,"publicationDate":"2003-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80481225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Discriminative training of natural language call routers 自然语言呼叫路由器的判别训练
IEEE Trans. Speech Audio Process. Pub Date : 2003-02-19 DOI: 10.1109/TSA.2002.807352
H. Kuo, Chin-Hui Lee
{"title":"Discriminative training of natural language call routers","authors":"H. Kuo, Chin-Hui Lee","doi":"10.1109/TSA.2002.807352","DOIUrl":"https://doi.org/10.1109/TSA.2002.807352","url":null,"abstract":"This paper shows how discriminative training can significantly improve classifiers used in natural language processing, using as an example the task of natural language call routing, where callers are transferred to desired departments based on natural spoken responses to an open-ended \"How may I direct your call?\" prompt. With vector-based natural language call routing, callers are transferred using a routing matrix trained on statistics of occurrence of words and word sequences in a training corpus. By re-training the routing matrix parameters using a minimum classification error criterion, a relative error rate reduction of 10-30% was achieved on a banking task. Increased robustness was demonstrated in that with 10% rejection, the error rate was reduced by 40%. Discriminative training also improves portability; we were able to train call routers with the highest known performance using as input only text transcription of routed calls, without any human intervention or knowledge about what terms are important or irrelevant for the routing task. This strategy was validated with both the banking task and a more difficult task involving calls to operators in the UK. The proposed formulation is applicable to algorithms addressing a broad range of speech understanding, information retrieval, and topic identification problems.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"73 1","pages":"24-35"},"PeriodicalIF":0.0,"publicationDate":"2003-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82043531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 65
Filter bank design for subband adaptive microphone arrays 子带自适应麦克风阵列滤波器组设计
IEEE Trans. Speech Audio Process. Pub Date : 2003-02-19 DOI: 10.1109/TSA.2002.807353
Jan Mark de Haan, N. Grbic, I. Claesson, S. Nordholm
{"title":"Filter bank design for subband adaptive microphone arrays","authors":"Jan Mark de Haan, N. Grbic, I. Claesson, S. Nordholm","doi":"10.1109/TSA.2002.807353","DOIUrl":"https://doi.org/10.1109/TSA.2002.807353","url":null,"abstract":"This paper presents a new method for the design of oversampled uniform DFT-filter banks for the special application of subband adaptive beamforming with microphone arrays. Since array applications rely on the fact that different source positions give rise to different signal delays, a beamformer alters the phase information of the signals. This in turn leads to signal degradations when perfect reconstruction filter banks are used for the subband decomposition and reconstruction. The objective of the filter bank design is to minimize the magnitude of all aliasing components individually, such that aliasing distortion is minimized although phase alterations occur in the subbands. The proposed method is evaluated in a car hands-free mobile telephony environment and the results show that the proposed method offers better performance regarding suppression levels of disturbing signals and much less distortion to the source speech.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"3 1","pages":"14-23"},"PeriodicalIF":0.0,"publicationDate":"2003-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90146045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 69
Linear regression based Bayesian predictive classification for speech recognition 基于线性回归的贝叶斯预测分类语音识别
IEEE Trans. Speech Audio Process. Pub Date : 2003-02-19 DOI: 10.1109/TSA.2002.805640
Jen-Tzung Chien
{"title":"Linear regression based Bayesian predictive classification for speech recognition","authors":"Jen-Tzung Chien","doi":"10.1109/TSA.2002.805640","DOIUrl":"https://doi.org/10.1109/TSA.2002.805640","url":null,"abstract":"The uncertainty in parameter estimation due to the adverse environments deteriorates the classification performance for speech recognition. It becomes crucial to incorporate the parameter uncertainty into decision so that the classification robustness can be assured. We propose a novel linear regression based Bayesian predictive classification (LRBPC) for robust speech recognition. This framework is constructed under the paradigm of linear regression adaptation of speech hidden Markov models (HMMs). Because the regression mapping between HMMs and adaptation data is ill posed, we properly characterize the uncertainty of regression parameters using a joint Gaussian distribution . A closed-form predictive distribution can be derived to set up the LRBPC decision for speech recognition. Such decision is robust compared to the plug-in maximum a posteriori (MAP) decision adopted in the maximum likelihood linear regression (MLLR) and MAP linear regression (MAPLR). Since the specified distribution belongs to the conjugate prior family, the evolutionary hyperparameters are established. With the statistically rich hyperparameters, the LRBPC achieves decision robustness. In the experiments, we find that LRBPC decision in cases of general linear regression as well as single variable linear regression attains significantly better recognition performance than MLLR and MAPLR adaptation.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"63 1","pages":"70-79"},"PeriodicalIF":0.0,"publicationDate":"2003-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90590519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信