2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

筛选
英文 中文
A study of discriminative feature extraction for i-vector based acoustic sniffing in IVN acoustic model training IVN声学模型训练中基于i向量的声学嗅探判别特征提取研究
Yu Zhang, Jian Xu, Zhijie Yan, Qiang Huo
{"title":"A study of discriminative feature extraction for i-vector based acoustic sniffing in IVN acoustic model training","authors":"Yu Zhang, Jian Xu, Zhijie Yan, Qiang Huo","doi":"10.1109/ICASSP.2012.6288814","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288814","url":null,"abstract":"Recently, we proposed an i-vector approach to acoustic sniffing for irrelevant variability normalization based acoustic model training in large vocabulary continuous speech recognition (LVCSR). Its effectiveness has been confirmed by experimental results on Switchboard- 1 conversational telephone speech transcription task. In this paper, we study several discriminative feature extraction approaches in i-vector space to improve both recognition accuracy and run-time efficiency. New experimental results are reported on a much larger scale LVCSR task with about 2000 hours training data.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"75 1","pages":"4077-4080"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83603373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multichannel speech dereverberation and separation with optimized combination of linear and non-linear filtering 多通道语音去噪和分离与线性和非线性滤波的优化组合
M. Togami, Y. Kawaguchi, Ryu Takeda, Y. Obuchi, N. Nukaga
{"title":"Multichannel speech dereverberation and separation with optimized combination of linear and non-linear filtering","authors":"M. Togami, Y. Kawaguchi, Ryu Takeda, Y. Obuchi, N. Nukaga","doi":"10.1109/ICASSP.2012.6288809","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288809","url":null,"abstract":"In this paper, we propose a multichannel speech dereverberation and separation technique which is effective even when there are multiple speakers and each speaker's transfer function is time-varying due to fluctuation of the corresponding speaker's head. For robustness against fluctuation, the proposed method optimizes linear filtering with non-linear filtering simultaneously from probabilistic perspective based on a probabilistic reverberant transfer-function model, PRTFM. PRTFM is an extension of the conventional time-invariant transfer-function model under uncertain conditions, and PRTFM can be also regarded as an extension of recently proposed blind local Gaussian modeling. The linear filtering and the non-linear filtering are optimized in MMSE (Minimum Mean Square Error) sense during parameter optimization. The proposed method is evaluated in a reverberant meeting room, and the proposed method is shown to be effective.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"53 1","pages":"4057-4060"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76287121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Trade-off evaluation for speech enhancement algorithms with respect to the a priori SNR estimation 基于先验信噪比估计的语音增强算法的权衡评估
Pei Chee Yong, S. Nordholm, H. H. Dam
{"title":"Trade-off evaluation for speech enhancement algorithms with respect to the a priori SNR estimation","authors":"Pei Chee Yong, S. Nordholm, H. H. Dam","doi":"10.1109/ICASSP.2012.6288957","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288957","url":null,"abstract":"In this paper, a modified a priori SNR estimator is proposed for speech enhancement. The well-known decision-directed (DD) approach is modified by matching each gain function with the noisy speech spectrum at current frame rather than the previous one. The proposed algorithm eliminates the speech transient distortion and reduces the impact from the choice of the gain function towards the level of smoothing in the SNR estimate. An objective evaluation metric is employed to measure the trade-off between musical noise, noise reduction and speech distortion. Performance is evaluated and compared between a modified sigmoid gain function, the state-of-the-art log-spectral amplitude estimator and the Wiener filter. Simulation results show that the modified DD approach performs better in terms of the trade-off evaluation.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"301 1","pages":"4657-4660"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73598202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Normalized amplitude modulation features for large vocabulary noise-robust speech recognition 大词汇噪声鲁棒语音识别的归一化调幅特征
V. Mitra, H. Franco, M. Graciarena, Arindam Mandal
{"title":"Normalized amplitude modulation features for large vocabulary noise-robust speech recognition","authors":"V. Mitra, H. Franco, M. Graciarena, Arindam Mandal","doi":"10.1109/ICASSP.2012.6288824","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288824","url":null,"abstract":"Background noise and channel degradations seriously constrain the performance of state-of-the-art speech recognition systems. Studies comparing human speech recognition performance with automatic speech recognition systems indicate that the human auditory system is highly robust against background noise and channel variabilities compared to automated systems. A traditional way to add robustness to a speech recognition system is to construct a robust feature set for the speech recognition model. In this work, we present an amplitude modulation feature derived from Teager's nonlinear energy operator that is power normalized and cosine transformed to produce normalized modulation cepstral coefficient (NMCC) features. The proposed NMCC features are compared with respect to state-of-the-art noise-robust features in Aurora-2 and a renoised Wall Street Journal (WSJ) corpus. The WSJ word-recognition experiments were performed on both a clean and artificially renoised WSJ corpus using SRI's DECIPHER large vocabulary speech recognition system. The experiments were performed under three train-test conditions: (a) matched, (b) mismatched, and (c) multi-conditioned. The Aurora-2 digit recognition task was performed using the standard HTK recognizer distributed with Aurora-2. Our results indicate that the proposed NMCC features demonstrated noise robustness in almost all the training-test conditions of renoised WSJ data and also improved digit recognition accuracies for Aurora-2 compared to the MFCCs and state-of-the-art noise-robust features.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"4117-4120"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85498682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 104
A family of Bounded Component Analysis algorithms 一类有界分量分析算法
A. Erdogan
{"title":"A family of Bounded Component Analysis algorithms","authors":"A. Erdogan","doi":"10.1109/ICASSP.2012.6288270","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288270","url":null,"abstract":"Bounded Component Analysis (BCA) has recently been introduced as an alternative method for the Blind Source Separation problem. Under the generic assumption on source boundedness, BCA provides a flexible framework for the separation of dependent (even correlated) as well as independent sources. This article provides a family of algorithms derived based on the geometric picture implied by the founding assumptions of the BCA approach. We also provide a numerical example demonstrating the ability of the proposed algorithms to separate mixtures of some dependent sources.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"28 1","pages":"1881-1884"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85551713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Automatic generation of synthesizable hardware implementation from high level RVC-cal description 从高级RVC-cal描述自动生成可合成的硬件实现
Khaled Jerbi, M. Raulet, O. Déforges, M. Abid
{"title":"Automatic generation of synthesizable hardware implementation from high level RVC-cal description","authors":"Khaled Jerbi, M. Raulet, O. Déforges, M. Abid","doi":"10.1109/ICASSP.2012.6288199","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288199","url":null,"abstract":"Data process algorithms are increasing in complexity especially for image and video coding. Therefore, hardware development using directly hardware description languages (HDL) such as VHDL or Verilog is a difficult task. Current research axes in this context are introducing new methodologies to automate the generation of such descriptions. In our work we adopted a high level and target-independent language called CAL (Caltrop Actor Language). This language is associated with a set of tools to easily design dataflow applications and also a hardware compiler to automatically generate the implementation. Before the modifications presented in this paper, the existing CAL hardware back-end did not support some high-level features of the CAL language. Consequently, high-level designed actors had to be manually transformed to be synthesizable. In this paper, we introduce a general automatic transformation of CAL descriptions to make these structures compliant and synthesizable. This transformation analyses the CAL code, detects the target features and makes the required changes to obtain synthesizable code while keeping the same application behavior. This work resolves the main bottleneck of the hardware generation flow from CAL designs.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"2 1","pages":"1597-1600"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85692792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Graph spectral compressed sensing for sensor networks 用于传感器网络的图谱压缩感知
Xiaofan Zhu, M. Rabbat
{"title":"Graph spectral compressed sensing for sensor networks","authors":"Xiaofan Zhu, M. Rabbat","doi":"10.1109/ICASSP.2012.6288515","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288515","url":null,"abstract":"Consider a wireless sensor network with N sensor nodes measuring data which are correlated temporally or spatially. We consider the problem of reconstructing the original data by only transmitting M ≪ N sensor readings while guaranteeing that the reconstruction error is small. Assuming the original signal is “smooth” with respect to the network topology, our approach is to gather measurements from a random subset of nodes and then interpolate with respect to the graph Laplacian eigenbasis, leveraging ideas from compressed sensing. We propose algorithms for both temporally and spatially correlated signals, and the performance of these algorithms is verified using both synthesized data and real world data. Significant savings are made in terms of energy resources, bandwidth, and query latency.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"58 1","pages":"2865-2868"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84595318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
Noncoherent misbehavior detection in space-time coded cooperative networks 时空编码合作网络中的非相干异常行为检测
Li-Chung Lo, Zhao-Jie Wang, Wan-Jen Huang
{"title":"Noncoherent misbehavior detection in space-time coded cooperative networks","authors":"Li-Chung Lo, Zhao-Jie Wang, Wan-Jen Huang","doi":"10.1109/ICASSP.2012.6288561","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288561","url":null,"abstract":"Consider a two-relay decode-and-forward (DF) cooperative network where Alamouti coding is adopted among relays to exploit spatial diversity. However, the spatial diversity gain is diminished with the existence of misbehaving relays. Most existing work on detecting malicious relays requires the knowledge of instantaneous channel status, which is usually unavailable if the relays garble retransmitted signals deliberately. With this regard, we propose a noncoherent misbehavior detection using the second-order statistics of channel estimates for relay-destination links. It shows from simulation results that increasing the number of received blocks provides significant improvement even at low SNR regime.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"3061-3064"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77671094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Detecting passive eavesdroppers in the MIMO wiretap channel 在MIMO窃听信道中检测无源窃听者
A. Mukherjee, A. L. Swindlehurst
{"title":"Detecting passive eavesdroppers in the MIMO wiretap channel","authors":"A. Mukherjee, A. L. Swindlehurst","doi":"10.1109/ICASSP.2012.6288501","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288501","url":null,"abstract":"The MIMO wiretap channel comprises a passive eavesdropper that attempts to intercept communications between an authorized transmitter-receiver pair, with each node being equipped with multiple antennas. In a dynamic network, it is imperative that the presence of a passive eavesdropper be determined before the transmitter can deploy robust secrecy-encoding schemes as a countermeasure. This is a difficult task in general, since by definition the eavesdropper is passive and never transmits. In this work we adopt a method that allows the legitimate nodes to detect the passive eavesdropper from the local oscillator power that is inadvertently leaked from its RF front end. We examine the performance of non-coherent energy detection as well as optimal coherent detection schemes. We then show how the proposed detectors allow the legitimate nodes to increase the MIMO secrecy rate of the channel.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"2809-2812"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78154830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 147
MLLR transforms of self-organized units as features in speaker recognition 自组织单元的MLLR变换作为特征在说话人识别中
M. Siu, Omer Lang, H. Gish, S. Lowe, Arthur Chan, O. Kimball
{"title":"MLLR transforms of self-organized units as features in speaker recognition","authors":"M. Siu, Omer Lang, H. Gish, S. Lowe, Arthur Chan, O. Kimball","doi":"10.1109/ICASSP.2012.6288891","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288891","url":null,"abstract":"Using speaker adaptation parameters, such as maximum likelihood linear regression (MLLR) adaptation matrices, as features for speaker recognition (SR) has been shown to perform well and can also provide complementary information for fusion with other acoustic-based SR systems, such as GMM-based systems. In order to estimate the adaptation parameters, a speech recognizer in the SR domain is required which in turn requires transcribed training data for recognizer training. This limits the approach only to domains where training transcriptions are available. To generalize the adaptation parameter approach to domains without transcriptions, we propose the use of self-organized unit recognizers that can be trained without supervision (or transcribed data). We report results on the 2002 NIST speaker recognition evaluation (SRE2002) extended data set and show that using MLLR parameters estimated from SOU recognizers give comparable performance to systems using a matched recognizers. SOU recognizers also outperform those using cross-lingual recognizers. When we fused the SOU- and word recognizers, SR equal error rate (EER) can be reduced by another 15%. This suggests SOU recognizers can be useful whether or not transcribed data for recognition training are available.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"267 1","pages":"4385-4388"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72910724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信