{"title":"A study of discriminative feature extraction for i-vector based acoustic sniffing in IVN acoustic model training","authors":"Yu Zhang, Jian Xu, Zhijie Yan, Qiang Huo","doi":"10.1109/ICASSP.2012.6288814","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288814","url":null,"abstract":"Recently, we proposed an i-vector approach to acoustic sniffing for irrelevant variability normalization based acoustic model training in large vocabulary continuous speech recognition (LVCSR). Its effectiveness has been confirmed by experimental results on Switchboard- 1 conversational telephone speech transcription task. In this paper, we study several discriminative feature extraction approaches in i-vector space to improve both recognition accuracy and run-time efficiency. New experimental results are reported on a much larger scale LVCSR task with about 2000 hours training data.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"75 1","pages":"4077-4080"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83603373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Togami, Y. Kawaguchi, Ryu Takeda, Y. Obuchi, N. Nukaga
{"title":"Multichannel speech dereverberation and separation with optimized combination of linear and non-linear filtering","authors":"M. Togami, Y. Kawaguchi, Ryu Takeda, Y. Obuchi, N. Nukaga","doi":"10.1109/ICASSP.2012.6288809","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288809","url":null,"abstract":"In this paper, we propose a multichannel speech dereverberation and separation technique which is effective even when there are multiple speakers and each speaker's transfer function is time-varying due to fluctuation of the corresponding speaker's head. For robustness against fluctuation, the proposed method optimizes linear filtering with non-linear filtering simultaneously from probabilistic perspective based on a probabilistic reverberant transfer-function model, PRTFM. PRTFM is an extension of the conventional time-invariant transfer-function model under uncertain conditions, and PRTFM can be also regarded as an extension of recently proposed blind local Gaussian modeling. The linear filtering and the non-linear filtering are optimized in MMSE (Minimum Mean Square Error) sense during parameter optimization. The proposed method is evaluated in a reverberant meeting room, and the proposed method is shown to be effective.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"53 1","pages":"4057-4060"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76287121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trade-off evaluation for speech enhancement algorithms with respect to the a priori SNR estimation","authors":"Pei Chee Yong, S. Nordholm, H. H. Dam","doi":"10.1109/ICASSP.2012.6288957","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288957","url":null,"abstract":"In this paper, a modified a priori SNR estimator is proposed for speech enhancement. The well-known decision-directed (DD) approach is modified by matching each gain function with the noisy speech spectrum at current frame rather than the previous one. The proposed algorithm eliminates the speech transient distortion and reduces the impact from the choice of the gain function towards the level of smoothing in the SNR estimate. An objective evaluation metric is employed to measure the trade-off between musical noise, noise reduction and speech distortion. Performance is evaluated and compared between a modified sigmoid gain function, the state-of-the-art log-spectral amplitude estimator and the Wiener filter. Simulation results show that the modified DD approach performs better in terms of the trade-off evaluation.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"301 1","pages":"4657-4660"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73598202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Mitra, H. Franco, M. Graciarena, Arindam Mandal
{"title":"Normalized amplitude modulation features for large vocabulary noise-robust speech recognition","authors":"V. Mitra, H. Franco, M. Graciarena, Arindam Mandal","doi":"10.1109/ICASSP.2012.6288824","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288824","url":null,"abstract":"Background noise and channel degradations seriously constrain the performance of state-of-the-art speech recognition systems. Studies comparing human speech recognition performance with automatic speech recognition systems indicate that the human auditory system is highly robust against background noise and channel variabilities compared to automated systems. A traditional way to add robustness to a speech recognition system is to construct a robust feature set for the speech recognition model. In this work, we present an amplitude modulation feature derived from Teager's nonlinear energy operator that is power normalized and cosine transformed to produce normalized modulation cepstral coefficient (NMCC) features. The proposed NMCC features are compared with respect to state-of-the-art noise-robust features in Aurora-2 and a renoised Wall Street Journal (WSJ) corpus. The WSJ word-recognition experiments were performed on both a clean and artificially renoised WSJ corpus using SRI's DECIPHER large vocabulary speech recognition system. The experiments were performed under three train-test conditions: (a) matched, (b) mismatched, and (c) multi-conditioned. The Aurora-2 digit recognition task was performed using the standard HTK recognizer distributed with Aurora-2. Our results indicate that the proposed NMCC features demonstrated noise robustness in almost all the training-test conditions of renoised WSJ data and also improved digit recognition accuracies for Aurora-2 compared to the MFCCs and state-of-the-art noise-robust features.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"4117-4120"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85498682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A family of Bounded Component Analysis algorithms","authors":"A. Erdogan","doi":"10.1109/ICASSP.2012.6288270","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288270","url":null,"abstract":"Bounded Component Analysis (BCA) has recently been introduced as an alternative method for the Blind Source Separation problem. Under the generic assumption on source boundedness, BCA provides a flexible framework for the separation of dependent (even correlated) as well as independent sources. This article provides a family of algorithms derived based on the geometric picture implied by the founding assumptions of the BCA approach. We also provide a numerical example demonstrating the ability of the proposed algorithms to separate mixtures of some dependent sources.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"28 1","pages":"1881-1884"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85551713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic generation of synthesizable hardware implementation from high level RVC-cal description","authors":"Khaled Jerbi, M. Raulet, O. Déforges, M. Abid","doi":"10.1109/ICASSP.2012.6288199","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288199","url":null,"abstract":"Data process algorithms are increasing in complexity especially for image and video coding. Therefore, hardware development using directly hardware description languages (HDL) such as VHDL or Verilog is a difficult task. Current research axes in this context are introducing new methodologies to automate the generation of such descriptions. In our work we adopted a high level and target-independent language called CAL (Caltrop Actor Language). This language is associated with a set of tools to easily design dataflow applications and also a hardware compiler to automatically generate the implementation. Before the modifications presented in this paper, the existing CAL hardware back-end did not support some high-level features of the CAL language. Consequently, high-level designed actors had to be manually transformed to be synthesizable. In this paper, we introduce a general automatic transformation of CAL descriptions to make these structures compliant and synthesizable. This transformation analyses the CAL code, detects the target features and makes the required changes to obtain synthesizable code while keeping the same application behavior. This work resolves the main bottleneck of the hardware generation flow from CAL designs.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"2 1","pages":"1597-1600"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85692792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph spectral compressed sensing for sensor networks","authors":"Xiaofan Zhu, M. Rabbat","doi":"10.1109/ICASSP.2012.6288515","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288515","url":null,"abstract":"Consider a wireless sensor network with N sensor nodes measuring data which are correlated temporally or spatially. We consider the problem of reconstructing the original data by only transmitting M ≪ N sensor readings while guaranteeing that the reconstruction error is small. Assuming the original signal is “smooth” with respect to the network topology, our approach is to gather measurements from a random subset of nodes and then interpolate with respect to the graph Laplacian eigenbasis, leveraging ideas from compressed sensing. We propose algorithms for both temporally and spatially correlated signals, and the performance of these algorithms is verified using both synthesized data and real world data. Significant savings are made in terms of energy resources, bandwidth, and query latency.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"58 1","pages":"2865-2868"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84595318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Noncoherent misbehavior detection in space-time coded cooperative networks","authors":"Li-Chung Lo, Zhao-Jie Wang, Wan-Jen Huang","doi":"10.1109/ICASSP.2012.6288561","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288561","url":null,"abstract":"Consider a two-relay decode-and-forward (DF) cooperative network where Alamouti coding is adopted among relays to exploit spatial diversity. However, the spatial diversity gain is diminished with the existence of misbehaving relays. Most existing work on detecting malicious relays requires the knowledge of instantaneous channel status, which is usually unavailable if the relays garble retransmitted signals deliberately. With this regard, we propose a noncoherent misbehavior detection using the second-order statistics of channel estimates for relay-destination links. It shows from simulation results that increasing the number of received blocks provides significant improvement even at low SNR regime.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"3061-3064"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77671094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detecting passive eavesdroppers in the MIMO wiretap channel","authors":"A. Mukherjee, A. L. Swindlehurst","doi":"10.1109/ICASSP.2012.6288501","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288501","url":null,"abstract":"The MIMO wiretap channel comprises a passive eavesdropper that attempts to intercept communications between an authorized transmitter-receiver pair, with each node being equipped with multiple antennas. In a dynamic network, it is imperative that the presence of a passive eavesdropper be determined before the transmitter can deploy robust secrecy-encoding schemes as a countermeasure. This is a difficult task in general, since by definition the eavesdropper is passive and never transmits. In this work we adopt a method that allows the legitimate nodes to detect the passive eavesdropper from the local oscillator power that is inadvertently leaked from its RF front end. We examine the performance of non-coherent energy detection as well as optimal coherent detection schemes. We then show how the proposed detectors allow the legitimate nodes to increase the MIMO secrecy rate of the channel.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"2809-2812"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78154830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Siu, Omer Lang, H. Gish, S. Lowe, Arthur Chan, O. Kimball
{"title":"MLLR transforms of self-organized units as features in speaker recognition","authors":"M. Siu, Omer Lang, H. Gish, S. Lowe, Arthur Chan, O. Kimball","doi":"10.1109/ICASSP.2012.6288891","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288891","url":null,"abstract":"Using speaker adaptation parameters, such as maximum likelihood linear regression (MLLR) adaptation matrices, as features for speaker recognition (SR) has been shown to perform well and can also provide complementary information for fusion with other acoustic-based SR systems, such as GMM-based systems. In order to estimate the adaptation parameters, a speech recognizer in the SR domain is required which in turn requires transcribed training data for recognizer training. This limits the approach only to domains where training transcriptions are available. To generalize the adaptation parameter approach to domains without transcriptions, we propose the use of self-organized unit recognizers that can be trained without supervision (or transcribed data). We report results on the 2002 NIST speaker recognition evaluation (SRE2002) extended data set and show that using MLLR parameters estimated from SOU recognizers give comparable performance to systems using a matched recognizers. SOU recognizers also outperform those using cross-lingual recognizers. When we fused the SOU- and word recognizers, SR equal error rate (EER) can be reduced by another 15%. This suggests SOU recognizers can be useful whether or not transcribed data for recognition training are available.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"267 1","pages":"4385-4388"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72910724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}