2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

筛选
英文 中文
Design and implementation of a fully integrated compressed-sensing signal acquisition system 全集成压缩传感信号采集系统的设计与实现
Juhwan Yoo, Stephen Becker, M. Monge, M. Loh, E. Candès, A. Emami-Neyestanak
{"title":"Design and implementation of a fully integrated compressed-sensing signal acquisition system","authors":"Juhwan Yoo, Stephen Becker, M. Monge, M. Loh, E. Candès, A. Emami-Neyestanak","doi":"10.1109/ICASSP.2012.6289123","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6289123","url":null,"abstract":"Compressed sensing (CS) is a topic of tremendous interest because it provides theoretical guarantees and computationally tractable algorithms to fully recover signals sampled at a rate close to its information content. This paper presents the design of the first physically realized fully-integrated CS based Analog-to-Information (A2I) pre-processor known as the Random-Modulation Pre-Integrator (RMPI) [1]. The RMPI achieves 2GHz bandwidth while digitizing samples at a rate 12.5× lower than the Nyquist rate. The success of this implementation is due to a coherent theory/algorithm/hardware co-design approach. This paper addresses key aspects of the design, presents simulation and hardware measurements, and discusses limiting factors in performance.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"11 1","pages":"5325-5328"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82383059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 73
A model structure integration based on a Bayesian framework for speech recognition 基于贝叶斯框架的语音识别模型结构集成
Sayaka Shiota, Kei Hashimoto, Yoshihiko Nankaku, K. Tokuda
{"title":"A model structure integration based on a Bayesian framework for speech recognition","authors":"Sayaka Shiota, Kei Hashimoto, Yoshihiko Nankaku, K. Tokuda","doi":"10.1109/ICASSP.2012.6288996","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288996","url":null,"abstract":"This paper proposes an acoustic modeling technique based on Bayesian framework using multiple model structures for speech recognition. The Bayesian approach is a statistical technique for estimating reliable predictive distributions by marginalizing model parameters, and its effectiveness in HMM-based speech recognition has been reported. Although the basic idea underlying the Bayesian approach is to treat all parameters as random variables, only one model structure is still selected in the conventional method. Multiple model structures are treated as latent variables in the proposed method and integrated based on the Bayesian framework. Furthermore, we applied deterministic annealing to the training algorithm to estimate appropriate acoustic models. The proposed method effectively utilizes multiple model structures, especially in the early stage of training and this leads to better predictive distributions and improvement of recognition performance.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"21 1","pages":"4813-4816"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82419336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized k-labelset ensemble for multi-label classification 多标签分类的广义k-标签集集成
Hung-Yi Lo, Shou-de Lin, H. Wang
{"title":"Generalized k-labelset ensemble for multi-label classification","authors":"Hung-Yi Lo, Shou-de Lin, H. Wang","doi":"10.1109/ICASSP.2012.6288315","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288315","url":null,"abstract":"Label powerset (LP) method is one category of multi-label learning algorithms. It reduces the multi-label classification problem to a multi-class classification problem by treating each distinct combination of labels in the training set as a different class. This paper proposes a basis expansion model for multi-label classification, where a basis function is a LP classifier trained on a random k-labelset. The expansion coefficients are learned to minimize the global error between the prediction and the multi-label ground truth. We derive an analytic solution to learn the coefficients efficiently. We have conducted experiments using several benchmark datasets and compared our method with other state-of-the-art multi-label learning methods. The results show that our method has better or competitive performance against other methods.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"43 1","pages":"2061-2064"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82542095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
On the identifiability of multi-observer hidden Markov models 多观测器隐马尔可夫模型的可辨识性
H. Nguyen, M. Roughan
{"title":"On the identifiability of multi-observer hidden Markov models","authors":"H. Nguyen, M. Roughan","doi":"10.1109/ICASSP.2012.6288268","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288268","url":null,"abstract":"Most large attacks on the Internet are distributed. As a result, such attacks are only partially observed by any one Internet service provider (ISP). Detection would be significantly easier with pooled observations, but privacy concerns often limit the information that providers are willing to share. Multi-party secure distributed computation provides a means for combining observations without compromising privacy. In this paper, we show the benefits of this approach, the most notable of which is that combinations of observations solve identifiability problems in existing approaches for detecting network attacks.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"61 1","pages":"1873-1876"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82560643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Adaptive parameter selection for asynchronous intrafascicular multi-electrode stimulation 异步束内多电极刺激的自适应参数选择
M. A. Frankel, G. Clark, S. Meek, R. Normann, V. J. Mathews
{"title":"Adaptive parameter selection for asynchronous intrafascicular multi-electrode stimulation","authors":"M. A. Frankel, G. Clark, S. Meek, R. Normann, V. J. Mathews","doi":"10.1109/ICASSP.2012.6287993","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6287993","url":null,"abstract":"This paper describes an adaptive algorithm for selecting perelectrode stimulus intensities and inter-electrode stimulation phasing to achieve desired isometric plantar-flexion forces via asynchronous, intrafascicular multi-electrode stimulation. The algorithm employed a linear model of force production and a gradient descent approach for updating the parameters of the model. The adaptively selected model stimulation parameters were validated in experiments in which stimulation was delivered via a Utah Slanted Electrode Array that was acutely implanted in the sciatic nerve of an anesthetized feline. In simulations and experiments, desired steps in force were evoked, and exhibited short time-to-peak (<; 0.5 s), low overshoot (<; 10%), low steady-state error (<; 4%), and low steady-state ripple (<; 12%), with rapid convergence of stimulation parameters. For periodic desired forces, the algorithm was able to quickly converge and experimental trials showed low amplitude error (mean error <; 10% of maximum force), and short time delay (<; 250 ms).","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"13 1","pages":"753-756"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82590822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Robust speech recognition through selection of speaker and environment transforms 通过说话人选择和环境变换实现鲁棒语音识别
Raghavendra Bilgi, Vikas Joshi, S. Umesh, Luz García, M. C. Benítez
{"title":"Robust speech recognition through selection of speaker and environment transforms","authors":"Raghavendra Bilgi, Vikas Joshi, S. Umesh, Luz García, M. C. Benítez","doi":"10.1109/ICASSP.2012.6288878","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288878","url":null,"abstract":"In this paper, we address the problem of robustness to both noise and speaker-variability in automatic speech recognition (ASR). We propose the use of pre-computed Noise and Speaker transforms, and an optimal combination of these two transforms are chosen during test using maximum-likelihood (ML) criterion. These pre-computed transforms are obtained during training by using data obtained from different noise conditions that are usually encountered for that particular ASR task. The environment transforms are obtained during training using constrained-MLLR (CMLLR) framework, while for speaker-transforms we use the analytically determined linear-VTLN matrices. Even though the exact noise environment may not be encountered during test, the ML-based choice of the closest Environment transform provides “sufficient” cleaning and this is corroborated by experimental results with performance comparable to histogram equalization or Vector Taylor Series approaches on Aurora-2 task. The proposed method is simple since it involves only the choice of pre-computed environment and speaker transforms and therefore, can be applied with very little test data unlike many other speaker and noise-compensation methods.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"36 1","pages":"4333-4336"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81343088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cartoon-like image reconstruction via constrained ℓp-minimization 基于约束最小化的类卡通图像重建
S. Hawe, M. Kleinsteuber, K. Diepold
{"title":"Cartoon-like image reconstruction via constrained ℓp-minimization","authors":"S. Hawe, M. Kleinsteuber, K. Diepold","doi":"10.1109/ICASSP.2012.6287984","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6287984","url":null,"abstract":"This paper considers the problem of reconstructing images from only a few measurements. A method is proposed that is based on the theory of Compressive Sensing. We introduce a new prior that combines an ℓp-pseudo-norm approximation of the image gradient and the bounded range of the original signal. Ultimately, this leads to a reconstruction algorithm that works particularly well for Cartoon-like images that commonly occur in medical imagery. The arising optimization task is solved by a Conjugate Gradient method that is capable of dealing with large scale problems and easily adapts to extensions of the prior. To overcome the none differentiability of the ℓp-pseudo-norm we employ a Huber-loss term like approximation together with a continuation of the smoothing parameter. Numerical results and a comparison with the state-of-the-art methods show the effectiveness of the proposed algorithm.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"38 1","pages":"717-720"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81351411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Google's cross-dialect Arabic voice search 谷歌的跨方言阿拉伯语语音搜索
Fadi Biadsy, P. Moreno, Martin Jansche
{"title":"Google's cross-dialect Arabic voice search","authors":"Fadi Biadsy, P. Moreno, Martin Jansche","doi":"10.1109/ICASSP.2012.6288905","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288905","url":null,"abstract":"We present a large scale effort to build a commercial Automatic Speech Recognition (ASR) product for Arabic. Our goal is to support voice search, dictation, and voice control for the general Arabic-speaking public, including support for multiple Arabic dialects. We describe our ASR system design and compare recognizers for five Arabic dialects, with the potential to reach more than 125 million people in Egypt, Jordan, Lebanon, Saudi Arabia, and the United Arab Emirates (UAE). We compare systems built on diacritized vs. non-diacritized text. We also conduct cross-dialect experiments, where we train on one dialect and test on the others. Our average word error rate (WER) is 24.8% for voice search.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"29 1","pages":"4441-4444"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81669827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
Sift-based multi-view cooperative tracking for soccer video 基于sift的足球视频多视点协同跟踪
Haopeng Li, M. Flierl
{"title":"Sift-based multi-view cooperative tracking for soccer video","authors":"Haopeng Li, M. Flierl","doi":"10.1109/ICASSP.2012.6288054","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288054","url":null,"abstract":"This paper presents a SIFT-based multi-view cooperative tracking scheme for multiple player tracking in soccer games. We assume that future sports events will be captured by an array of fixed high-definition cameras which provide multi-view video sequences. The imagery will then be used to provide a free-viewpoint networked experience. In this work, SIFT features are used to extract the interview and inter-frame correlation among related views. Hence, accurate 3D information of each player can be efficiently utilized for real time multiple player tracking. By sharing the 3D information with all cameras and exploiting the perspective diversity of the multi-camera system, occlusion problems can be solved effectively. The extracted 3D information improves the average reliability of tracking by more than 10% when compared to SIFT-based 2D tracking.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"30 1","pages":"1001-1004"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81742143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Adaptive human silhouette reconstruction based on the exploration of temporal information 基于时间信息挖掘的自适应人体轮廓重建
Xue Zhou, Xi Li, Tat-Jun Chin, D. Suter
{"title":"Adaptive human silhouette reconstruction based on the exploration of temporal information","authors":"Xue Zhou, Xi Li, Tat-Jun Chin, D. Suter","doi":"10.1109/ICASSP.2012.6288055","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288055","url":null,"abstract":"Human silhouette reconstruction has a wide range of applications in motion analysis, object segmentation and tracking, etc. In this paper, we propose a human silhouette reconstruction method based on the exploration of temporal information. Given a test silhouette, the proposed method aims to find its reliable templates for reconstruction by using the intrinsic temporal relationship among different frames. To effectively obtain such templates, we propose an adaptive criterion based on the non-negative least square optimization. Experimental results on two challenging datasets demonstrate the effectiveness of our method.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"52 1","pages":"1005-1008"},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82097023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信