IEEE Transactions on Audio Speech and Language Processing最新文献

筛选
英文 中文
Convergence Analysis of Narrowband Feedback Active Noise Control System With Imperfect Secondary Path Estimation 次要路径估计不完全的窄带反馈有源噪声控制系统收敛性分析
IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2277934
Liang Wang, W. Gan, Andy W. H. Khong, S. Kuo
{"title":"Convergence Analysis of Narrowband Feedback Active Noise Control System With Imperfect Secondary Path Estimation","authors":"Liang Wang, W. Gan, Andy W. H. Khong, S. Kuo","doi":"10.1109/TASL.2013.2277934","DOIUrl":"https://doi.org/10.1109/TASL.2013.2277934","url":null,"abstract":"In many practical active noise control (ANC) applications, feedback structure using estimated secondary path to synthesize reference signal is preferred under various conditions. This paper analyzes the convergence behavior of the narrowband feedback ANC systems with imperfect secondary path estimation. Existing approaches do not include the analysis of the reference signal synthesis errors due to its interrelated feedback nature. In this paper, the reconstruction error is modeled using the secondary path estimation error. Using this model, the effects of estimation errors on the convergence of the feedback ANC system is investigated. To further examine the effects of error in the filtered- x and filtered- y signal paths, these two paths are analyze separately to isolate the effects caused by these paths. Computer simulations are conducted to verify the theoretical analysis presented in the paper.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2277934","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62892030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Unsupervised Spoken Language Understanding for a Multi-Domain Dialog System 多域对话系统的无监督口语理解
IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2280212
Donghyeon Lee, Minwoo Jeong, Kyungduk Kim, Seonghan Ryu, G. G. Lee
{"title":"Unsupervised Spoken Language Understanding for a Multi-Domain Dialog System","authors":"Donghyeon Lee, Minwoo Jeong, Kyungduk Kim, Seonghan Ryu, G. G. Lee","doi":"10.1109/TASL.2013.2280212","DOIUrl":"https://doi.org/10.1109/TASL.2013.2280212","url":null,"abstract":"This paper proposes an unsupervised spoken language understanding (SLU) framework for a multi-domain dialog system. Our unsupervised SLU framework applies a non-parametric Bayesian approach to dialog acts, intents and slot entities, which are the components of a semantic frame. The proposed approach reduces the human effort necessary to obtain a semantically annotated corpus for dialog system development. In this study, we analyze clustering results using various evaluation metrics for four dialog corpora. We also introduce a multi-domain dialog system that uses the unsupervised SLU framework. We argue that our unsupervised approach can help overcome the annotation acquisition bottleneck in developing dialog systems. To verify this claim, we report a dialog system evaluation, in which our method achieves competitive results in comparison with a system that uses a manually annotated corpus. In addition, we conducted several experiments to explore the effect of our approach on reducing development costs. The results show that our approach be helpful for the rapid development of a prototype system and reducing the overall development costs.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2280212","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62892553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Active-Set Newton Algorithm for Overcomplete Non-Negative Representations of Audio 音频过完全非负表示的主动集牛顿算法
IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2263144
T. Virtanen, J. Gemmeke, B. Raj
{"title":"Active-Set Newton Algorithm for Overcomplete Non-Negative Representations of Audio","authors":"T. Virtanen, J. Gemmeke, B. Raj","doi":"10.1109/TASL.2013.2263144","DOIUrl":"https://doi.org/10.1109/TASL.2013.2263144","url":null,"abstract":"This paper proposes a computationally efficient algorithm for estimating the non-negative weights of linear combinations of the atoms of large-scale audio dictionaries, so that the generalized Kullback-Leibler divergence between an audio observation and the model is minimized. This linear model has been found useful in many audio signal processing tasks, but the existing algorithms are computationally slow when a large number of atoms is used. The proposed algorithm is based on iteratively updating a set of active atoms, with the weights updated using the Newton method and the step size estimated such that the weights remain non-negative. Algorithm convergence evaluations on representing audio spectra that are mixtures of two speakers show that with all the tested dictionary sizes the proposed method reaches a much lower value of the divergence than can be obtained by conventional algorithms, and is up to 8 times faster. A source separation evaluation revealed that when using large dictionaries, the proposed method produces a better separation quality in less time.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2263144","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62890161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 70
Robust SVD-Based Audio Watermarking Scheme With Differential Evolution Optimization 基于差分进化优化的鲁棒svd音频水印方案
IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2277929
B. Lei, I. Soon, Ee-Leng Tan
{"title":"Robust SVD-Based Audio Watermarking Scheme With Differential Evolution Optimization","authors":"B. Lei, I. Soon, Ee-Leng Tan","doi":"10.1109/TASL.2013.2277929","DOIUrl":"https://doi.org/10.1109/TASL.2013.2277929","url":null,"abstract":"In this paper, a robust audio watermarking scheme based on singular value decomposition (SVD) and differential evolution (DE) using dither modulation (DM) quantization algorithm is proposed. Two novel SVD-based algorithms, lifting wavelet transform (LWT)-discrete cosine transform (DCT)-SVD and discrete wavelet transform (DWT)-DCT-SVD, are developed for audio copyright protection. In our method, LWTDWT is first applied to decompose the host signal and obtain the corresponding approximate coefficients followed by DCT to take advantage of “energy compaction” property. SVD is further performed to acquire the singular values and enhance the robustness of the scheme. The adaptive DM quantization is adopted to quantize the singular values and embed the watermark. To withstand desynchronization attacks, synchronization code is inserted using audio statistical characteristics. Furthermore, the conflicting problem of robustness and imperceptibility is effectively resolved by the DE optimization. Simulation results demonstrate that both the LWT-DCT-SVD and DWT-DCT-SVD methods not only have good imperceptibility performance, but also resist general signal processing, hybrid and desynchronization attacks. Compared with the previous DWT-DCT, support vector regression (SVR)-DWT-DCT and DWT-SVD methods, our method obtains more robustness against the selected attacks.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2277929","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62892113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 106
Robust Segments Detector for De-Synchronization Resilient Audio Watermarking 鲁棒片段检测器用于去同步弹性音频水印
IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2279312
Chi-Man Pun, Xiaochen Yuan
{"title":"Robust Segments Detector for De-Synchronization Resilient Audio Watermarking","authors":"Chi-Man Pun, Xiaochen Yuan","doi":"10.1109/TASL.2013.2279312","DOIUrl":"https://doi.org/10.1109/TASL.2013.2279312","url":null,"abstract":"A robust feature points detector for invariant audio watermarking is proposed in this paper. The audio segments centering at the detected feature points are extracted for both watermark embedding and extraction. These feature points are invariant to various attacks and will not be changed much for maintaining high auditory quality. Besides, high robustness and inaudibility can be achieved by embedding the watermark into the approximation coefficients of Stationary Wavelet Transform (SWT) domain, which is shift invariant. The spread spectrum communication technique is adopted to embed the watermark. Experimental results show that the proposed Robust Audio Segments Extractor (RASE) and the watermarking scheme are not only robust against common audio signal processing, such as low-pass filtering, MP3 compression, echo addition, volume change, and normalization; and distortions introduced in Stir-mark benchmark for Audio; but also robust against synchronization geometric distortions simultaneously, such as resample time-scale modification (TSM) with scaling factors up to ±50%, pitch invariant TSM by ±50%, and tempo invariant pitch shifting by ±50%. In general, the proposed scheme can well resist various attacks by the joint RASE and SWT approach, which performs much better comparing with the existing state-of-the art methods.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2279312","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62892423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Quality Measure Functions for Calibration of Speaker Recognition Systems in Various Duration Conditions 不同持续时间条件下说话人识别系统校准的质量测量函数
IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2279332
M. I. Mandasari, R. Saeidi, Mitchell McLaren, D. V. Leeuwen
{"title":"Quality Measure Functions for Calibration of Speaker Recognition Systems in Various Duration Conditions","authors":"M. I. Mandasari, R. Saeidi, Mitchell McLaren, D. V. Leeuwen","doi":"10.1109/TASL.2013.2279332","DOIUrl":"https://doi.org/10.1109/TASL.2013.2279332","url":null,"abstract":"This paper investigates the effect of utterance duration to the calibration of a modern i-vector speaker recognition system with probabilistic linear discriminant analysis (PLDA) modeling. A calibration approach to deal with these effects using quality measure functions (QMFs) is proposed to include duration in the calibration transformation. Extensive experiments are performed in order to evaluate the robustness of the proposed calibration approach for unseen conditions in the training of calibration parameters. Using the latest NIST corpora for evaluation, results highlight the importance of considering the quality metrics like duration in calibrating the scores for automatic speaker recognition systems.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2279332","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62892435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
Semi-Blind Noise Extraction Using Partially Known Position of the Target Source 利用部分已知目标源位置的半盲噪声提取
IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-10-01 DOI: 10.1109/TASL.2013.2264674
Zbyněk Koldovský, J. Málek, P. Tichavský, F. Nesta
{"title":"Semi-Blind Noise Extraction Using Partially Known Position of the Target Source","authors":"Zbyněk Koldovský, J. Málek, P. Tichavský, F. Nesta","doi":"10.1109/TASL.2013.2264674","DOIUrl":"https://doi.org/10.1109/TASL.2013.2264674","url":null,"abstract":"An extracted noise signal provides important information for subsequent enhancement of a target signal. When the target's position is fixed, the noise extractor could be a target-cancellation filter derived in a noise-free situation. In this paper we consider a situation when such cancellation filters are prepared for a set of several possible positions of the target in advance. The set of filters is interpreted as prior information available for the noise extraction when the target's exact position is unknown. Our novel method looks for a linear combination of the prepared filters via Independent Component Analysis. The method yields a filter that has a better cancellation performance than the individual filters or filters based on a minimum variance principle. The method is tested in a highly noisy and reverberant real-world environment with moving target source and interferers. A post-processing by Wiener filter using the noise signal extracted by the method is able to improve signal-to-noise ratio of the target by up to 8 dB.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2264674","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62890238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization 基于非负矩阵分解的有监督和无监督语音增强
IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-10-01 DOI: 10.1109/TASL.2013.2270369
N. Mohammadiha, P. Smaragdis, A. Leijon
{"title":"Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization","authors":"N. Mohammadiha, P. Smaragdis, A. Leijon","doi":"10.1109/TASL.2013.2270369","DOIUrl":"https://doi.org/10.1109/TASL.2013.2270369","url":null,"abstract":"Reducing the interference noise in a monaural noisy speech signal has been a challenging task for many years. Compared to traditional unsupervised speech enhancement methods, e.g., Wiener filtering, supervised approaches, such as algorithms based on hidden Markov models (HMM), lead to higher-quality enhanced speech signals. However, the main practical difficulty of these approaches is that for each noise type a model is required to be trained a priori. In this paper, we investigate a new class of supervised speech denoising algorithms using nonnegative matrix factorization (NMF). We propose a novel speech enhancement method that is based on a Bayesian formulation of NMF (BNMF). To circumvent the mismatch problem between the training and testing stages, we propose two solutions. First, we use an HMM in combination with BNMF (BNMF-HMM) to derive a minimum mean square error (MMSE) estimator for the speech signal with no information about the underlying noise type. Second, we suggest a scheme to learn the required noise BNMF model online, which is then used to develop an unsupervised speech enhancement system. Extensive experiments are carried out to investigate the performance of the proposed methods under different conditions. Moreover, we compare the performance of the developed algorithms with state-of-the-art speech enhancement schemes using various objective measures. Our simulations show that the proposed BNMF-based methods outperform the competing algorithms substantially.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2270369","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62890890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 370
A Direct Masking Approach to Robust ASR 鲁棒ASR的直接掩蔽方法
IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-10-01 DOI: 10.1109/TASL.2013.2263802
William Hartmann, A. Narayanan, E. Fosler-Lussier, Deliang Wang
{"title":"A Direct Masking Approach to Robust ASR","authors":"William Hartmann, A. Narayanan, E. Fosler-Lussier, Deliang Wang","doi":"10.1109/TASL.2013.2263802","DOIUrl":"https://doi.org/10.1109/TASL.2013.2263802","url":null,"abstract":"Recently, much work has been devoted to the computation of binary masks for speech segregation. Conventional wisdom in the field of ASR holds that these binary masks cannot be used directly; the missing energy significantly affects the calculation of the cepstral features commonly used in ASR. We show that this commonly held belief may be a misconception; we demonstrate the effectiveness of directly using the masked data on both a small and large vocabulary dataset. In fact, this approach, which we term the direct masking approach, performs comparably to two previously proposed missing feature techniques. We also investigate the reasons why other researchers may have not come to this conclusion; variance normalization of the features is a significant factor in performance. This work suggests a much better baseline than unenhanced speech for future work in missing feature ASR.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2263802","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62889852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Feature Enhancement With Joint Use of Consecutive Corrupted and Noise Feature Vectors With Discriminative Region Weighting 结合区分区域加权的连续损坏和噪声特征向量的特征增强
IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-10-01 DOI: 10.1109/TASL.2013.2270407
Masayuki Suzuki, Takuya Yoshioka, Shinji Watanabe, N. Minematsu, K. Hirose
{"title":"Feature Enhancement With Joint Use of Consecutive Corrupted and Noise Feature Vectors With Discriminative Region Weighting","authors":"Masayuki Suzuki, Takuya Yoshioka, Shinji Watanabe, N. Minematsu, K. Hirose","doi":"10.1109/TASL.2013.2270407","DOIUrl":"https://doi.org/10.1109/TASL.2013.2270407","url":null,"abstract":"This paper proposes a feature enhancement method that can achieve high speech recognition performance in a variety of noise environments with feasible computational cost. As the well-known Stereo-based Piecewise Linear Compensation for Environments (SPLICE) algorithm, the proposed method learns piecewise linear transformation to map corrupted feature vectors to the corresponding clean features, which enables efficient operation. To make the feature enhancement process adaptive to changes in noise, the piecewise linear transformation is performed by using a subspace of the joint space of corrupted and noise feature vectors, where the subspace is chosen such that classes (i.e., Gaussian mixture components) of underlying clean feature vectors can be best predicted. In addition, we propose utilizing temporally adjacent frames of corrupted and noise features in order to leverage dynamic characteristics of feature vectors. To prevent overfitting caused by the high dimensionality of the extended feature vectors covering the neighboring frames, we introduce regularized weighted minimum mean square error criterion. The proposed method achieved relative improvements of 34.2% and 22.2% over SPLICE under the clean and multi-style conditions, respectively, on the Aurora 2 task.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2270407","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62890737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信