Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, G. Saon, B. Ramabhadran
{"title":"Improvements to filterbank and delta learning within a deep neural network framework","authors":"Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, G. Saon, B. Ramabhadran","doi":"10.1109/ICASSP.2014.6854925","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854925","url":null,"abstract":"Many features used in speech recognition tasks are hand-crafted and are not always related to the objective at hand, that is minimizing word error rate. Recently, we showed that replacing a perceptually motivated mel-filter bank with a filter bank layer that is learned jointly with the rest of a deep neural network was promising. In this paper, we extend filter learning to a speaker-adapted, state-of-the-art system. First, we incorporate delta learning into the filter learning framework. Second, we incorporate various speaker adaptation techniques, including VTLN warping and speaker identity features. On a 50-hour English Broadcast News task, we show that we can achieve a 5% relative improvement in word error rate (WER) using the filter and delta learning, compared to having a fixed set of filters and deltas. Furthermore, after speaker adaptation, we find that filter and delta learning allows for a 3% relative improvement in WER compared to a state-of-the-art CNN.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"54 1","pages":"6839-6843"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75775499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Frequency-shift filtering for OFDM recovery in narrowband power line communications","authors":"Nir Shlezinger, R. Dabora","doi":"10.1109/ICASSP.2014.6855173","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6855173","url":null,"abstract":"Power line communications (PLC) has been drawing considerable interest in recent years due to the growing interest in smart grid implementation. In smart grids, network control and grid applications are allocated the frequency band of 0-500 kHz, commonly referred to as the narrowband PLC channel. This channel is characterized by strong periodic noise and low signal to noise ratio (SNR). In this work we propose a receiver which uses frequency shift filtering to exploit the cyclostationary properties of both the narrowband PLC noise, as well as the information signal, digitally modulated using orthogonal frequency division multiplexing. The results show that the new receiver obtains a substantial performance gain over previously proposed receivers, without requiring any coordination with the transmitter.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"29 1","pages":"8073-8077"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75794418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reconstruction of sparse signals from highly corrupted measurements by nonconvex minimization","authors":"Marko Filipovic","doi":"10.1109/ICASSP.2014.6854230","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854230","url":null,"abstract":"We propose a method for signal recovery in compressed sensing when measurements can be highly corrupted. It is based on ℓ<sub>p</sub> minimization for 0 <; p ≤ 1. Since it was shown that ℓ<sub>p</sub> minimization performs better than ℓ<sub>1</sub> minimization when there are no large errors, the proposed approach is a natural extension to compressed sensing with corruptions. We provide a theoretical justification of this idea, based on analogous reasoning as in the case when measurements are not corrupted by large errors. Better performance of the proposed approach compared to ℓ<sub>1</sub> minimization is illustrated in numerical experiments.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"28 1","pages":"3395-3399"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74659349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A computationally efficient calibration algorithm for the LOFAR radio astronomical array","authors":"Yuntao Wu, Amir Leshem, S. Wijnholds","doi":"10.1109/ICASSP.2014.6854635","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854635","url":null,"abstract":"In this paper, the problem of self-calibration for large astronomical arrays such as the Dutch Low Frequency Array (LOFAR) is considered. We assume direction dependent gain and phase errors which need to be estimated and calibrated out. Combining the subspace fitting and least square approaches, the signal subspace of the received single short-term interval (STI) sample data of the LOFAR is used to build a cost function whose minimizer is a statistically efficient estimator of the unknown parameters-the gains and phases of the telescopes. Subsequently, an iterative algorithm for finding the minimum of the cost function is presented and the unknown calibration parameters of both the core stations and the external subarray are separated. As a result, the computational complexity of the proposed method is significantly reduced compared to the existing methods based on a direct covariance fitting. Finally, the performance of the proposed method is compared with the conventional peeling method in computer simulation. An example for calibrating the core of the LOFAR array on Cyg A is also provided.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"13 1","pages":"5402-5406"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74875531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Retrieving the syntactic structure of erroneous ASR transcriptions for open-domain Spoken Language Understanding","authors":"Frédéric Béchet, Benoit Favre, Alexis Nasr, Mathieu Morey","doi":"10.1109/ICASSP.2014.6854372","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854372","url":null,"abstract":"Retrieving the syntactic structure of erroneous ASR transcriptions can be of great interest for open-domain Spoken Language Understanding tasks in order to correct or at least reduce the impact of ASR errors on final applications. Most of the previous works on ASR and syntactic parsing have addressed this problem by using syntactic features during ASR to help reducing Word Error Rate (WER). The improvement obtained is often rather small, however the structure and the relations between words obtained through parsing can be of great interest for the SLU processes, even without a significant decrease of WER. That is why we adopt another point of view in this paper: considering that ASR transcriptions contain inevitably some errors, we show in this study that it is possible to improve the syntactic analysis of these erroneous transcriptions by performing a joint error detection / syntactic parsing process. The applicative framework used in this study is a speech-to-speech system developed through the DARPA BOLT project.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"4097-4101"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73007246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigation of unsupervised adaptation of DNN acoustic models with filter bank input","authors":"Takuya Yoshioka, A. Ragni, M. Gales","doi":"10.1109/ICASSP.2014.6854825","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854825","url":null,"abstract":"Adaptation to speaker variations is an essential component of speech recognition systems. One common approach to adapting deep neural network (DNN) acoustic models is to perform global constrained maximum likelihood linear regression (CMLLR) at some point of the systems. Using CMLLR (or more generally, generative approaches) is advantageous especially in unsupervised adaptation scenarios with high baseline error rates. On the other hand, as the DNNs are less sensitive to the increase in the input dimensionality than GMMs, it is becoming more popular to use rich speech representations, such as log mel-filter bank channel outputs, instead of conventional low-dimensional feature vectors, such as MFCCs and PLP coefficients. This work discusses and compares three different configurations of DNN acoustic models that allow CMLLR-based speaker adaptive training (SAT) to be performed in systems with filter bank inputs. Results of unsupervised adaptation experiments conducted on three different data sets are presented, demonstrating that, by choosing an appropriate configuration, SAT with CMLLR can improve the performance of a well-trained filter bank-based speaker independent DNN system by 10.6% relative in a challenging task with a baseline error rate above 40%. It is also shown that the filter bank features are advantageous than the conventional features even when they are used with SAT models. Some other insights are also presented, including the effects of block diagonal transforms and system combination.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"13 1","pages":"6344-6348"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73112069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved low-delay MDCT-based coding of both stationary and transient audio signals","authors":"Christian R. Helmrich, Goran Markovic, B. Edler","doi":"10.1109/ICASSP.2014.6854948","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854948","url":null,"abstract":"General-purpose MDCT-based audio coders like MP3 or HE-AAC utilize long inter-transform overlap and lookahead-based transform length switching to provide good coding quality for both stationary and non-stationary, i. e. transient, input signals even at low bitrates. In low-delay communication scenarios such as Voice over IP, however, algorithmic delay due to framing and overlap typically needs to be reduced and additional lookahead must be avoided. We show that these restrictions limit the performance of contemporary low-delay transform coders on either stationary or transient material and propose 3 modifications: an improved noise substitution technique and increased overlap between “long”transforms for stationary, and “long to short” transform length switching without lookahead and directly from the long overlap for transient frames. A listening test indicates the merit of these changes when integrated into AAC-LD.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"6954-6958"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75314659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"New bivariate statistical model of natural image correlations","authors":"Che-Chun Su, L. Cormack, A. Bovik","doi":"10.1109/ICASSP.2014.6854627","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854627","url":null,"abstract":"We perform bivariate statistical analysis and modeling of the joint distributions of spatially adjacent sub-band responses for both luminance/chrominance and range data in natural scenes. In particular, we introduce a multivariate generalized Gaussian distribution and an exponentiated sine function to model the underlying statistics and correlations. The experimental results show that the bivariate statistics relating spatially adjacent pixels in both 2D color images and range maps are well described by the proposed models. We validate the robustness of the proposed bivariate models using a multi-variate statistical hypothesis test, and further demonstrate their effectiveness with application to a prototype depth estimation algorithm.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"70 8 1","pages":"5362-5366"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75566605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jean-Adrien Vernhes, M. Chabert, B. Lacaze, G. Lesthievent, R. Baudin
{"title":"Selective analytic signal construction from a non-uniformly sampled bandpass signal","authors":"Jean-Adrien Vernhes, M. Chabert, B. Lacaze, G. Lesthievent, R. Baudin","doi":"10.1109/ICASSP.2014.6854549","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854549","url":null,"abstract":"This paper proposes a method that simultaneously builds the analytic signal from non-uniform samples of a bandpass signal and rejects interferences. The analytic signal is required for many onboard operations in communication satellites. This method operates in the time domain and without preliminary demodulation, using Periodic Non-uniform Sampling of order 2 (PNS2). This non-uniform sampling scheme can be easily implemented with available devices. Exact formulas for the analytic signal construction are derived for an infinite observation window (an infinite number of samples). For practical applications, the formulas should also demonstrate a high convergence rate due to the finite observation window. Formulas with increasing convergence rates are thus derived. The proposed method has been tested through simulations according to the number of available samples, the interference parameters and the filter transfer function regularity.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"3 1","pages":"4978-4982"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75580492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ryo Kuroiwa, Ryo Matsuoka, Seisuke Kyochi, K. Shirai, M. Okuda
{"title":"Lossless/near-lossless color image coding by inverse demosaicing","authors":"Ryo Kuroiwa, Ryo Matsuoka, Seisuke Kyochi, K. Shirai, M. Okuda","doi":"10.1109/ICASSP.2014.6853951","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853951","url":null,"abstract":"In this paper, we introduce a novel framework for lossless/near-lossless (LS/NLS) color image coding assisted by an inverse demosaicing. Conventional frameworks are typically based on prediction (and quantization for NLS coding) followed by entropy coding, such as the JPEG-LS for bit rate saving. The approach of this work is totally different from the conventional ones. Basically, color images are created by demosaicing Bayer-pattern color filter array (CFA) whose operator can be expressed as square matrices. By using the (pseudo) inverse matrix of a joint demosaicing and color-to-gray conversion, the proposed decoder can recover the color image from its corresponding gray image data which is losslessly transmitted by the proposed encoder. Thus, LS/NLS color image reconstruction can be achieved while saving a bit rate significantly. In addition, using the same framework of color image coding, LS/NLS CFA coding can be realized by a comparable bit rate with JPEG-LS.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"34 1","pages":"2011-2014"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73720710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}