IEEE Transactions on Audio Speech and Language Processing最新文献_第3页

Cross Pattern Coherence Algorithm for Spatial Filtering Applications Utilizing Microphone Arrays 利用传声器阵列进行空间滤波的交叉模式相干算法

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2277928

Symeon Delikaris-Manias, V. Pulkki

{"title":"Cross Pattern Coherence Algorithm for Spatial Filtering Applications Utilizing Microphone Arrays","authors":"Symeon Delikaris-Manias, V. Pulkki","doi":"10.1109/TASL.2013.2277928","DOIUrl":"https://doi.org/10.1109/TASL.2013.2277928","url":null,"abstract":"A parametric spatial filtering algorithm with a fixed beam direction is proposed in this paper. The algorithm utilizes the normalized cross-spectral density between signals from microphones of different orders as a criterion for focusing in specific directions. The correlation between microphone signals is estimated in the time-frequency domain. A post-filter is calculated from a multichannel input and is used to assign attenuation values to a coincidentally captured audio signal. The proposed algorithm is simple to implement and offers the capability of coping with interfering sources at different azimuthal locations with or without the presence of diffuse sound. It is implemented by using directional microphones placed in the same look direction and have the same magnitude and phase response. Experiments are conducted with simulated and real microphone arrays employing the proposed post-filter and compared to previous coherence-based approaches, such as the McCowan post-filter. A significant improvement is demonstrated in terms of objective quality measures. Formal listening tests conducted to assess the audibility of artifacts of the proposed algorithm in real acoustical scenarios show that no annoying artifacts existed with certain spectral floor values. Examples of the proposed algorithm can be found online at http://www.acoustics.hut.fi/projects/cropac/soundExamples.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"2356-2367"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2277928","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62891892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Passive Temporal Offset Estimation of Multichannel Recordings of an Ad-Hoc Microphone Array Ad-Hoc麦克风阵列多通道记录的无源时间偏移估计

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASLP.2013.2286921

Pasi Pertilä, M. Hämäläinen, Mikael Mieskolainen

{"title":"Passive Temporal Offset Estimation of Multichannel Recordings of an Ad-Hoc Microphone Array","authors":"Pasi Pertilä, M. Hämäläinen, Mikael Mieskolainen","doi":"10.1109/TASLP.2013.2286921","DOIUrl":"https://doi.org/10.1109/TASLP.2013.2286921","url":null,"abstract":"In recent years ad-hoc microphone arrays have become ubiquitous, and the capture hardware and quality is increasingly more sophisticated. Ad-hoc arrays hold a vast potential for audio applications, but they are inherently asynchronous, i.e., temporal offset exists in each channel, and furthermore the device locations are generally unknown. Therefore, the data is not directly suitable for traditional microphone array applications such as source localization and beamforming. This work presents a least squares method for temporal offset estimation of a static ad-hoc microphone array. The method utilizes the captured audio content without the need to emit calibration signals, provided that during the recording a sufficient amount of sound sources surround the array. The Cramer-Rao lower bound of the estimator is given and the effect of limited number of surrounding sources on the solution accuracy is investigated. A practical implementation is then presented using non-linear filtering with automatic parameter adjustment. Simulations over a range of reverberation and noise levels demonstrate the algorithm's robustness. Using smartphones an average RMS error of 3.5 samples (at 48 kHz) was reached when the algorithm's assumptions were met.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"2393-2402"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASLP.2013.2286921","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62892231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

Second Order Methods for Optimizing Convex Matrix Functions and Sparse Covariance Clustering 二阶优化凸矩阵函数和稀疏协方差聚类方法

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2263142

Gillian M. Chin, J. Nocedal, P. Olsen, Steven J. Rennie

{"title":"Second Order Methods for Optimizing Convex Matrix Functions and Sparse Covariance Clustering","authors":"Gillian M. Chin, J. Nocedal, P. Olsen, Steven J. Rennie","doi":"10.1109/TASL.2013.2263142","DOIUrl":"https://doi.org/10.1109/TASL.2013.2263142","url":null,"abstract":"A variety of first-order methods have recently been proposed for solving matrix optimization problems arising in machine learning. The premise for utilizing such algorithms is that second order information is too expensive to employ, and so simple first-order iterations are likely to be optimal. In this paper, we argue that second-order information is in fact efficiently accessible in many matrix optimization problems, and can be effectively incorporated into optimization algorithms. We begin by reviewing how certain Hessian operations can be conveniently represented in a wide class of matrix optimization problems, and provide the first proofs for these results. Next we consider a concrete problem, namely the minimization of the ℓ1 regularized Jeffreys divergence, and derive formulae for computing Hessians and Hessian vector products. This allows us to propose various second order methods for solving the Jeffreys divergence problem. We present extensive numerical results illustrating the behavior of the algorithms and apply the methods to a speech recognition problem. We compress full covariance Gaussian mixture models utilized for acoustic models in automatic speech recognition. By discovering clusters of (sparse inverse) covariance matrices, we can compress the number of covariance parameters by a factor exceeding 200, while still outperforming the word error rate (WER) performance of a diagonal covariance model that has 20 times less covariance parameters than the original acoustic model.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"123 1","pages":"2244-2254"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2263142","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62889848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Distributional Semantic Models for Affective Text Analysis 情感文本分析的分布语义模型

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2277931

Nikos Malandrakis, A. Potamianos, Elias Iosif, Shrikanth S. Narayanan

{"title":"Distributional Semantic Models for Affective Text Analysis","authors":"Nikos Malandrakis, A. Potamianos, Elias Iosif, Shrikanth S. Narayanan","doi":"10.1109/TASL.2013.2277931","DOIUrl":"https://doi.org/10.1109/TASL.2013.2277931","url":null,"abstract":"We present an affective text analysis model that can directly estimate and combine affective ratings of multi-word terms, with application to the problem of sentence polarity/semantic orientation detection. Starting from a hierarchical compositional method for generating sentence ratings, we expand the model by adding multi-word terms that can capture non-compositional semantics. The method operates similarly to a bigram language model, using bigram terms or backing off to unigrams based on a (degree of) compositionality criterion. The affective ratings for n-gram terms of different orders are estimated via a corpus-based method using distributional semantic similarity metrics between unseen words and a set of seed words. N-gram ratings are then combined into sentence ratings via simple algebraic formulas. The proposed framework produces state-of-the-art results for word-level tasks in English and German and the sentence-level news headlines classification SemEval'07-Task14 task. The inclusion of bigram terms to the model provides significant performance improvement, even if no term selection is applied.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"2379-2392"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2277931","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62891780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 69

Introduction to the Special Section on Large-Scale Optimization for Audio, Speech, and Language Processing 关于音频、语音和语言处理大规模优化的专题介绍

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2283631

D. Kanevsky, Xiaodong He, G. Heigold, Haizhou Li, Stephen J. Wright

引用次数: 0

Optimization Techniques to Improve Training Speed of Deep Neural Networks for Large Speech Tasks 提高大型语音任务深度神经网络训练速度的优化技术

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2284378

Tara N. Sainath, Brian Kingsbury, H. Soltau, B. Ramabhadran

{"title":"Optimization Techniques to Improve Training Speed of Deep Neural Networks for Large Speech Tasks","authors":"Tara N. Sainath, Brian Kingsbury, H. Soltau, B. Ramabhadran","doi":"10.1109/TASL.2013.2284378","DOIUrl":"https://doi.org/10.1109/TASL.2013.2284378","url":null,"abstract":"While Deep Neural Networks (DNNs) have achieved tremendous success for large vocabulary continuous speech recognition (LVCSR) tasks, training these networks is slow. Even to date, the most common approach to train DNNs is via stochastic gradient descent, serially on one machine. Serial training, coupled with the large number of training parameters (i.e., 10-50 million) and speech data set sizes (i.e., 20-100 million training points) makes DNN training very slow for LVCSR tasks. In this work, we explore a variety of different optimization techniques to improve DNN training speed. This includes parallelization of the gradient computation during cross-entropy and sequence training, as well as reducing the number of parameters in the network using a low-rank matrix factorization. Applying the proposed optimization techniques, we show that DNN training can be sped up by a factor of 3 on a 50-hour English Broadcast News (BN) task with no loss in accuracy. Furthermore, using the proposed techniques, we are able to train DNNs on a 300-hr Switchboard (SWB) task and a 400-hr English BN task, showing improvements between 9-30% relative over a state-of-the art GMM/HMM system while the number of parameters of the DNN is smaller than the GMM/HMM system.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"2267-2276"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2284378","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62892587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 61

A Difference of Convex Functions Approach to Large-Scale Log-Linear Model Estimation 大规模对数线性模型估计的凸函数差分法

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2271592

Theodoros Tsiligkaridis, E. Marcheret, V. Goel

{"title":"A Difference of Convex Functions Approach to Large-Scale Log-Linear Model Estimation","authors":"Theodoros Tsiligkaridis, E. Marcheret, V. Goel","doi":"10.1109/TASL.2013.2271592","DOIUrl":"https://doi.org/10.1109/TASL.2013.2271592","url":null,"abstract":"We introduce a new class of parameter estimation methods for log-linear models. Our approach relies on the fact that minimizing a rational function of mixtures of exponentials is equivalent to minimizing a difference of convex functions. This allows us to construct convex auxiliary functions by applying the concave-convex procedure (CCCP). We consider a modification of CCCP where a proximal term is added (ProxCCCP), and extend it further by introducing an ℓ1 penalty. For solving the ` convex + ℓ1' auxiliary problem, we propose an approach called SeqGPSR that is based on sequential application of the GPSR procedure. We present convergence analysis of the algorithms, including sufficient conditions for convergence to a critical point of the objective function. We propose an adaptive procedure for varying the strength of the proximal regularization term in each ProxCCCP iteration, and show this procedure (AProxCCCP) is effective in practice and stable under some mild conditions. The CCCP procedure and proposed variants are applied to the task of optimizing the cross-entropy objective function for an audio frame classification problem. Class posteriors are modeled using log-linear models consisting of approximately 6 million parameters. Our results show that CCCP variants achieve a much better cross-entropy objective value as compared to direct optimization of the objective function by a first order gradient based approach, stochastic gradient descent or the L-BFGS procedure.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"2255-2266"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2271592","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62891324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Room Reverberation Reconstruction: Interpolation of the Early Part Using Compressed Sensing 室内混响重建:使用压缩传感的早期部分插值

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2273662

R. Mignot, L. Daudet, F. Ollivier

引用次数: 50

Optimization Algorithms and Applications for Speech and Language Processing 语音和语言处理的优化算法和应用

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2283777

Stephen J. Wright, D. Kanevsky, L. Deng, Xiaodong He, G. Heigold, Haizhou Li

引用次数: 28

Diffused Sensing for Sharp Directive Beamforming 锐利指令波束形成的扩散传感

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2274695

K. Niwa, Yusuke Hioka, K. Furuya, Y. Haneda

{"title":"Diffused Sensing for Sharp Directive Beamforming","authors":"K. Niwa, Yusuke Hioka, K. Furuya, Y. Haneda","doi":"10.1109/TASL.2013.2274695","DOIUrl":"https://doi.org/10.1109/TASL.2013.2274695","url":null,"abstract":"We generalized our previously proposed diffused sensing for a microphone array design to achieve sharp directive beamforming to enable various filter design methods to be applied. In the conventional microphone array, various filter design methods have been studied to narrow the directivity beam width. However, it is difficult to minimize the power of interference sources in the beamforming output (output interference power) over a broad frequency range since the cross-correlation between transfer functions from sound sources to microphones increases in some frequencies. With the diffused sensing, the cross-correlation is minimized by physically varying the transfer functions. We investigated how a microphone array should be designed in order to minimize the cross-correlation between transfer functions and found that placing the array in a diffuse acoustic field produces optimum results. Because the transfer functions are known a priori, this finding makes it possible to narrow the directivity beam width over a broad frequency range. This observation can be practically achieved by placing microphones inside a reflective enclosure, part of which is open to let sound waves enter. We conducted experiments using 24 microphones and confirmed that the output interference power was reduced over a broad frequency range and the beam width was narrowed by using the diffused sensing.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"2346-2355"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2274695","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62891867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13