IEEE Transactions on Audio Speech and Language Processing最新文献_第4页

Large Vocabulary Speech Recognition on Parallel Architectures 基于并行结构的大词汇语音识别

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2271591

P. Cardinal, P. Dumouchel, Gilles Boulianne

{"title":"Large Vocabulary Speech Recognition on Parallel Architectures","authors":"P. Cardinal, P. Dumouchel, Gilles Boulianne","doi":"10.1109/TASL.2013.2271591","DOIUrl":"https://doi.org/10.1109/TASL.2013.2271591","url":null,"abstract":"The speed of modern processors has remained constant over the last few years but the integration capacity continues to follow Moore's law and thus, to be scalable, applications must be parallelized. The parallelization of the classical Viterbi beam search has been shown to be very difficult on multi-core processor architectures or massively threaded architectures such as Graphics Processing Unit (GPU). The problem with this approach is that active states are scattered in memory and thus, they cannot be efficiently transferred to the processor memory. This problem can be circumvented by using the A* search which uses a heuristic to significantly reduce the number of explored hypotheses. The main advantage of this algorithm is that the processing time is moved from the search in the recognition network to the computation of heuristic costs, which can be designed to take advantage of parallel architectures. Our parallel implementation of the A* decoder on a 4-core processor with a GPU led to a speed-up factor of 6.13 compared to the Viterbi beam search at its maximum capacity and an improvement of 4% absolute in accuracy at real-time.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"2290-2300"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2271591","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62891565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Convergence Analysis of Narrowband Feedback Active Noise Control System With Imperfect Secondary Path Estimation 次要路径估计不完全的窄带反馈有源噪声控制系统收敛性分析

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2277934

Liang Wang, W. Gan, Andy W. H. Khong, S. Kuo

引用次数: 21

Unsupervised Spoken Language Understanding for a Multi-Domain Dialog System 多域对话系统的无监督口语理解

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2280212

Donghyeon Lee, Minwoo Jeong, Kyungduk Kim, Seonghan Ryu, G. G. Lee

{"title":"Unsupervised Spoken Language Understanding for a Multi-Domain Dialog System","authors":"Donghyeon Lee, Minwoo Jeong, Kyungduk Kim, Seonghan Ryu, G. G. Lee","doi":"10.1109/TASL.2013.2280212","DOIUrl":"https://doi.org/10.1109/TASL.2013.2280212","url":null,"abstract":"This paper proposes an unsupervised spoken language understanding (SLU) framework for a multi-domain dialog system. Our unsupervised SLU framework applies a non-parametric Bayesian approach to dialog acts, intents and slot entities, which are the components of a semantic frame. The proposed approach reduces the human effort necessary to obtain a semantically annotated corpus for dialog system development. In this study, we analyze clustering results using various evaluation metrics for four dialog corpora. We also introduce a multi-domain dialog system that uses the unsupervised SLU framework. We argue that our unsupervised approach can help overcome the annotation acquisition bottleneck in developing dialog systems. To verify this claim, we report a dialog system evaluation, in which our method achieves competitive results in comparison with a system that uses a manually annotated corpus. In addition, we conducted several experiments to explore the effect of our approach on reducing development costs. The results show that our approach be helpful for the rapid development of a prototype system and reducing the overall development costs.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"2451-2464"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2280212","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62892553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Active-Set Newton Algorithm for Overcomplete Non-Negative Representations of Audio 音频过完全非负表示的主动集牛顿算法

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2263144

T. Virtanen, J. Gemmeke, B. Raj

{"title":"Active-Set Newton Algorithm for Overcomplete Non-Negative Representations of Audio","authors":"T. Virtanen, J. Gemmeke, B. Raj","doi":"10.1109/TASL.2013.2263144","DOIUrl":"https://doi.org/10.1109/TASL.2013.2263144","url":null,"abstract":"This paper proposes a computationally efficient algorithm for estimating the non-negative weights of linear combinations of the atoms of large-scale audio dictionaries, so that the generalized Kullback-Leibler divergence between an audio observation and the model is minimized. This linear model has been found useful in many audio signal processing tasks, but the existing algorithms are computationally slow when a large number of atoms is used. The proposed algorithm is based on iteratively updating a set of active atoms, with the weights updated using the Newton method and the step size estimated such that the weights remain non-negative. Algorithm convergence evaluations on representing audio spectra that are mixtures of two speakers show that with all the tested dictionary sizes the proposed method reaches a much lower value of the divergence than can be obtained by conventional algorithms, and is up to 8 times faster. A source separation evaluation revealed that when using large dictionaries, the proposed method produces a better separation quality in less time.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"2277-2289"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2263144","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62890161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 70

Robust SVD-Based Audio Watermarking Scheme With Differential Evolution Optimization 基于差分进化优化的鲁棒svd音频水印方案

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2277929

B. Lei, I. Soon, Ee-Leng Tan

{"title":"Robust SVD-Based Audio Watermarking Scheme With Differential Evolution Optimization","authors":"B. Lei, I. Soon, Ee-Leng Tan","doi":"10.1109/TASL.2013.2277929","DOIUrl":"https://doi.org/10.1109/TASL.2013.2277929","url":null,"abstract":"In this paper, a robust audio watermarking scheme based on singular value decomposition (SVD) and differential evolution (DE) using dither modulation (DM) quantization algorithm is proposed. Two novel SVD-based algorithms, lifting wavelet transform (LWT)-discrete cosine transform (DCT)-SVD and discrete wavelet transform (DWT)-DCT-SVD, are developed for audio copyright protection. In our method, LWTDWT is first applied to decompose the host signal and obtain the corresponding approximate coefficients followed by DCT to take advantage of “energy compaction” property. SVD is further performed to acquire the singular values and enhance the robustness of the scheme. The adaptive DM quantization is adopted to quantize the singular values and embed the watermark. To withstand desynchronization attacks, synchronization code is inserted using audio statistical characteristics. Furthermore, the conflicting problem of robustness and imperceptibility is effectively resolved by the DE optimization. Simulation results demonstrate that both the LWT-DCT-SVD and DWT-DCT-SVD methods not only have good imperceptibility performance, but also resist general signal processing, hybrid and desynchronization attacks. Compared with the previous DWT-DCT, support vector regression (SVR)-DWT-DCT and DWT-SVD methods, our method obtains more robustness against the selected attacks.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"2368-2378"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2277929","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62892113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 106

Robust Segments Detector for De-Synchronization Resilient Audio Watermarking 鲁棒片段检测器用于去同步弹性音频水印

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2279312

Chi-Man Pun, Xiaochen Yuan

{"title":"Robust Segments Detector for De-Synchronization Resilient Audio Watermarking","authors":"Chi-Man Pun, Xiaochen Yuan","doi":"10.1109/TASL.2013.2279312","DOIUrl":"https://doi.org/10.1109/TASL.2013.2279312","url":null,"abstract":"A robust feature points detector for invariant audio watermarking is proposed in this paper. The audio segments centering at the detected feature points are extracted for both watermark embedding and extraction. These feature points are invariant to various attacks and will not be changed much for maintaining high auditory quality. Besides, high robustness and inaudibility can be achieved by embedding the watermark into the approximation coefficients of Stationary Wavelet Transform (SWT) domain, which is shift invariant. The spread spectrum communication technique is adopted to embed the watermark. Experimental results show that the proposed Robust Audio Segments Extractor (RASE) and the watermarking scheme are not only robust against common audio signal processing, such as low-pass filtering, MP3 compression, echo addition, volume change, and normalization; and distortions introduced in Stir-mark benchmark for Audio; but also robust against synchronization geometric distortions simultaneously, such as resample time-scale modification (TSM) with scaling factors up to ±50%, pitch invariant TSM by ±50%, and tempo invariant pitch shifting by ±50%. In general, the proposed scheme can well resist various attacks by the joint RASE and SWT approach, which performs much better comparing with the existing state-of-the art methods.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"2412-2424"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2279312","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62892423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 51

Quality Measure Functions for Calibration of Speaker Recognition Systems in Various Duration Conditions 不同持续时间条件下说话人识别系统校准的质量测量函数

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2279332

M. I. Mandasari, R. Saeidi, Mitchell McLaren, D. V. Leeuwen

引用次数: 77

Semi-Blind Noise Extraction Using Partially Known Position of the Target Source 利用部分已知目标源位置的半盲噪声提取

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-10-01 DOI: 10.1109/TASL.2013.2264674

Zbyněk Koldovský, J. Málek, P. Tichavský, F. Nesta

引用次数: 31

Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization 基于非负矩阵分解的有监督和无监督语音增强

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-10-01 DOI: 10.1109/TASL.2013.2270369

N. Mohammadiha, P. Smaragdis, A. Leijon

{"title":"Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization","authors":"N. Mohammadiha, P. Smaragdis, A. Leijon","doi":"10.1109/TASL.2013.2270369","DOIUrl":"https://doi.org/10.1109/TASL.2013.2270369","url":null,"abstract":"Reducing the interference noise in a monaural noisy speech signal has been a challenging task for many years. Compared to traditional unsupervised speech enhancement methods, e.g., Wiener filtering, supervised approaches, such as algorithms based on hidden Markov models (HMM), lead to higher-quality enhanced speech signals. However, the main practical difficulty of these approaches is that for each noise type a model is required to be trained a priori. In this paper, we investigate a new class of supervised speech denoising algorithms using nonnegative matrix factorization (NMF). We propose a novel speech enhancement method that is based on a Bayesian formulation of NMF (BNMF). To circumvent the mismatch problem between the training and testing stages, we propose two solutions. First, we use an HMM in combination with BNMF (BNMF-HMM) to derive a minimum mean square error (MMSE) estimator for the speech signal with no information about the underlying noise type. Second, we suggest a scheme to learn the required noise BNMF model online, which is then used to develop an unsupervised speech enhancement system. Extensive experiments are carried out to investigate the performance of the proposed methods under different conditions. Moreover, we compare the performance of the developed algorithms with state-of-the-art speech enhancement schemes using various objective measures. Our simulations show that the proposed BNMF-based methods outperform the competing algorithms substantially.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"2140-2151"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2270369","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62890890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 370

A Direct Masking Approach to Robust ASR 鲁棒ASR的直接掩蔽方法

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-10-01 DOI: 10.1109/TASL.2013.2263802

William Hartmann, A. Narayanan, E. Fosler-Lussier, Deliang Wang

引用次数: 43