2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

筛选
英文 中文
Attaining fundamental bounds on timing synchronization 获得时序同步的基本边界
P. Bidigare, Upamanyu Madhow, R. Mudumbai, D. Scherber
{"title":"Attaining fundamental bounds on timing synchronization","authors":"P. Bidigare, Upamanyu Madhow, R. Mudumbai, D. Scherber","doi":"10.1109/ICASSP.2012.6289099","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6289099","url":null,"abstract":"In this paper, we propose an algorithm for timing synchronization that attains fundamental bounds derived by Weiss and Weinstein. These bounds state that, in addition to improving with time-bandwidth product and signal-to-noise ratio (SNR), timing accuracy also improves as the carrier frequency gets larger, if the SNR is above a threshold. Our algorithm essentially follows the logic of the Weiss-Weinstein bound, and has the following stages: coarse estimation using time domain samples, fine-grained estimation using a Newton algorithm in the frequency domain, and final refinement to within a small fraction of a carrier cycle. While the results here are of fundamental interest, we are motivated to push the limits of synchronization to enable the tight coordination required for emulating virtual antenna arrays using a collection of cooperating nodes.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77221237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Audio event detection from acoustic unit occurrence patterns 从声学单元发生模式中检测音频事件
Anurag Kumar, Pranay Dighe, Rita Singh, Sourish Chaudhuri, B. Raj
{"title":"Audio event detection from acoustic unit occurrence patterns","authors":"Anurag Kumar, Pranay Dighe, Rita Singh, Sourish Chaudhuri, B. Raj","doi":"10.1109/ICASSP.2012.6287923","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6287923","url":null,"abstract":"In most real-world audio recordings, we encounter several types of audio events. In this paper, we develop a technique for detecting signature audio events, that is based on identifying patterns of occurrences of automatically learned atomic units of sound, which we call Acoustic Unit Descriptors or AUDs. Experiments show that the methodology works as well for detection of individual events and their boundaries in complex recordings.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82164850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
A Bayesian framework for robust speech enhancement under varying contexts 不同语境下稳健语音增强的贝叶斯框架
D. Hanumantha, Rao Naidu, Sriram Srinivasan
{"title":"A Bayesian framework for robust speech enhancement under varying contexts","authors":"D. Hanumantha, Rao Naidu, Sriram Srinivasan","doi":"10.1109/ICASSP.2012.6288932","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288932","url":null,"abstract":"Single-microphone speech enhancement algorithms that employ trained codebooks of parametric representations of speech spectra have been shown to be successful in the suppression of non-stationary noise, e.g., in mobile phones. In this paper, we introduce the concept of a context-dependent codebook, and look at two aspects of context: dependency on the particular speaker using the mobile device, and on the acoustic condition during usage (e.g., hands-free mode in a reverberant room). Such context-dependent codebooks may be trained on-line. A new scheme is proposed to appropriately combine the estimates resulting from the context-dependent and context-independent codebooks under a Bayesian framework. Experimental results establish that the proposed approach performs better than the context-independent codebook in the case of a context match and better than the context-dependent codebook in the case of a context mismatch.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82179452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Improving arabic broadcast transcription using automatic topic clustering 利用自动主题聚类改进阿拉伯语广播转录
Stephen M. Chu, L. Mangu
{"title":"Improving arabic broadcast transcription using automatic topic clustering","authors":"Stephen M. Chu, L. Mangu","doi":"10.1109/ICASSP.2012.6288907","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288907","url":null,"abstract":"Latent Dirichlet Allocation (LDA) has been shown to be an effective model to augment n-gram language models in speech recognition applications. In this work, we aim to take advantage of the superior unsupervised learning ability of the framework, and use it to uncover topic structure embedded in the corpora in an entirely data-driven fashion. In addition, we describe a bi-level inference and classification method that allows topic clustering at the utterance level while preserving the document-level topic structures. We demonstrate the effectiveness of the proposed topic clustering pipeline in a state-of-the-art Arabic broadcast transcription system. Experiments show that optimizing LM in the LDA topic space leads to 5% reduction in language model perplexity. It is further shown that topic clustering and adaptation is able to attain 0.4% absolute word error rate reduction on the GALE Arabic task.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82185271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Design and implementation of a fully integrated compressed-sensing signal acquisition system 全集成压缩传感信号采集系统的设计与实现
Juhwan Yoo, Stephen Becker, M. Monge, M. Loh, E. Candès, A. Emami-Neyestanak
{"title":"Design and implementation of a fully integrated compressed-sensing signal acquisition system","authors":"Juhwan Yoo, Stephen Becker, M. Monge, M. Loh, E. Candès, A. Emami-Neyestanak","doi":"10.1109/ICASSP.2012.6289123","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6289123","url":null,"abstract":"Compressed sensing (CS) is a topic of tremendous interest because it provides theoretical guarantees and computationally tractable algorithms to fully recover signals sampled at a rate close to its information content. This paper presents the design of the first physically realized fully-integrated CS based Analog-to-Information (A2I) pre-processor known as the Random-Modulation Pre-Integrator (RMPI) [1]. The RMPI achieves 2GHz bandwidth while digitizing samples at a rate 12.5× lower than the Nyquist rate. The success of this implementation is due to a coherent theory/algorithm/hardware co-design approach. This paper addresses key aspects of the design, presents simulation and hardware measurements, and discusses limiting factors in performance.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82383059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 73
A model structure integration based on a Bayesian framework for speech recognition 基于贝叶斯框架的语音识别模型结构集成
Sayaka Shiota, Kei Hashimoto, Yoshihiko Nankaku, K. Tokuda
{"title":"A model structure integration based on a Bayesian framework for speech recognition","authors":"Sayaka Shiota, Kei Hashimoto, Yoshihiko Nankaku, K. Tokuda","doi":"10.1109/ICASSP.2012.6288996","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288996","url":null,"abstract":"This paper proposes an acoustic modeling technique based on Bayesian framework using multiple model structures for speech recognition. The Bayesian approach is a statistical technique for estimating reliable predictive distributions by marginalizing model parameters, and its effectiveness in HMM-based speech recognition has been reported. Although the basic idea underlying the Bayesian approach is to treat all parameters as random variables, only one model structure is still selected in the conventional method. Multiple model structures are treated as latent variables in the proposed method and integrated based on the Bayesian framework. Furthermore, we applied deterministic annealing to the training algorithm to estimate appropriate acoustic models. The proposed method effectively utilizes multiple model structures, especially in the early stage of training and this leads to better predictive distributions and improvement of recognition performance.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82419336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized k-labelset ensemble for multi-label classification 多标签分类的广义k-标签集集成
Hung-Yi Lo, Shou-de Lin, H. Wang
{"title":"Generalized k-labelset ensemble for multi-label classification","authors":"Hung-Yi Lo, Shou-de Lin, H. Wang","doi":"10.1109/ICASSP.2012.6288315","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288315","url":null,"abstract":"Label powerset (LP) method is one category of multi-label learning algorithms. It reduces the multi-label classification problem to a multi-class classification problem by treating each distinct combination of labels in the training set as a different class. This paper proposes a basis expansion model for multi-label classification, where a basis function is a LP classifier trained on a random k-labelset. The expansion coefficients are learned to minimize the global error between the prediction and the multi-label ground truth. We derive an analytic solution to learn the coefficients efficiently. We have conducted experiments using several benchmark datasets and compared our method with other state-of-the-art multi-label learning methods. The results show that our method has better or competitive performance against other methods.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82542095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
On the identifiability of multi-observer hidden Markov models 多观测器隐马尔可夫模型的可辨识性
H. Nguyen, M. Roughan
{"title":"On the identifiability of multi-observer hidden Markov models","authors":"H. Nguyen, M. Roughan","doi":"10.1109/ICASSP.2012.6288268","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288268","url":null,"abstract":"Most large attacks on the Internet are distributed. As a result, such attacks are only partially observed by any one Internet service provider (ISP). Detection would be significantly easier with pooled observations, but privacy concerns often limit the information that providers are willing to share. Multi-party secure distributed computation provides a means for combining observations without compromising privacy. In this paper, we show the benefits of this approach, the most notable of which is that combinations of observations solve identifiability problems in existing approaches for detecting network attacks.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82560643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Adaptive parameter selection for asynchronous intrafascicular multi-electrode stimulation 异步束内多电极刺激的自适应参数选择
M. A. Frankel, G. Clark, S. Meek, R. Normann, V. J. Mathews
{"title":"Adaptive parameter selection for asynchronous intrafascicular multi-electrode stimulation","authors":"M. A. Frankel, G. Clark, S. Meek, R. Normann, V. J. Mathews","doi":"10.1109/ICASSP.2012.6287993","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6287993","url":null,"abstract":"This paper describes an adaptive algorithm for selecting perelectrode stimulus intensities and inter-electrode stimulation phasing to achieve desired isometric plantar-flexion forces via asynchronous, intrafascicular multi-electrode stimulation. The algorithm employed a linear model of force production and a gradient descent approach for updating the parameters of the model. The adaptively selected model stimulation parameters were validated in experiments in which stimulation was delivered via a Utah Slanted Electrode Array that was acutely implanted in the sciatic nerve of an anesthetized feline. In simulations and experiments, desired steps in force were evoked, and exhibited short time-to-peak (<; 0.5 s), low overshoot (<; 10%), low steady-state error (<; 4%), and low steady-state ripple (<; 12%), with rapid convergence of stimulation parameters. For periodic desired forces, the algorithm was able to quickly converge and experimental trials showed low amplitude error (mean error <; 10% of maximum force), and short time delay (<; 250 ms).","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82590822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Robust speech recognition through selection of speaker and environment transforms 通过说话人选择和环境变换实现鲁棒语音识别
Raghavendra Bilgi, Vikas Joshi, S. Umesh, Luz García, M. C. Benítez
{"title":"Robust speech recognition through selection of speaker and environment transforms","authors":"Raghavendra Bilgi, Vikas Joshi, S. Umesh, Luz García, M. C. Benítez","doi":"10.1109/ICASSP.2012.6288878","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288878","url":null,"abstract":"In this paper, we address the problem of robustness to both noise and speaker-variability in automatic speech recognition (ASR). We propose the use of pre-computed Noise and Speaker transforms, and an optimal combination of these two transforms are chosen during test using maximum-likelihood (ML) criterion. These pre-computed transforms are obtained during training by using data obtained from different noise conditions that are usually encountered for that particular ASR task. The environment transforms are obtained during training using constrained-MLLR (CMLLR) framework, while for speaker-transforms we use the analytically determined linear-VTLN matrices. Even though the exact noise environment may not be encountered during test, the ML-based choice of the closest Environment transform provides “sufficient” cleaning and this is corroborated by experimental results with performance comparable to histogram equalization or Vector Taylor Series approaches on Aurora-2 task. The proposed method is simple since it involves only the choice of pre-computed environment and speaker transforms and therefore, can be applied with very little test data unlike many other speaker and noise-compensation methods.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81343088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信