2013 IEEE International Conference on Acoustics, Speech and Signal Processing最新文献

筛选
英文 中文
Regularized Adaboost for content identification 用于内容识别的规范化Adaboost
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-10-21 DOI: 10.1109/ICASSP.2013.6638224
Honghai Yu, P. Moulin
{"title":"Regularized Adaboost for content identification","authors":"Honghai Yu, P. Moulin","doi":"10.1109/ICASSP.2013.6638224","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638224","url":null,"abstract":"This paper proposes a regularized Adaboost learning algorithm to extract binary fingerprints by filtering and quantizing perceptually significant features. The proposed algorithm extends the recent symmetric pairwise boosting (SPB) algorithm by taking feature sequence correlation into account. Information and learning theoretic analysis is given. Significant performance gains over SPB are demonstrated for both audio and video fingerprinting.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129337444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Robust low-complexity multichannel equalization for dereverberation 鲁棒低复杂度多通道均衡去噪
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-10-21 DOI: 10.1109/ICASSP.2013.6637736
Felicia Lim, P. Naylor
{"title":"Robust low-complexity multichannel equalization for dereverberation","authors":"Felicia Lim, P. Naylor","doi":"10.1109/ICASSP.2013.6637736","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6637736","url":null,"abstract":"Multichannel equalization of acoustic impulse responses (AIRs) is an important approach for dereverberation. Since AIRs are inevitably estimated with system identification error (SIE), it is necessary to develop equalization designs that are robust to such SIE, in order for dereverberation processing to be beneficial. We present here a novel subband equalizer employing the relaxed multichannel least squares (RMCLS) algorithm in each subband. We show that this new structure brings improved performance in dereverberation as well as a reduction in computational load by up to a factor of more than 90 in our experiments. We then develop a novel controller for the dereverberation processing in subbands that guarantees robustness to even very severe SIEs by backing off dereverberation in any subband with excessively high levels of SIEs.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124991510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Robust joint sparse recovery on data with outliers 带有异常值数据的鲁棒联合稀疏恢复
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-10-21 DOI: 10.1109/ICASSP.2013.6638373
Ozgur Balkan, K. Kreutz-Delgado, S. Makeig
{"title":"Robust joint sparse recovery on data with outliers","authors":"Ozgur Balkan, K. Kreutz-Delgado, S. Makeig","doi":"10.1109/ICASSP.2013.6638373","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638373","url":null,"abstract":"We propose a method to solve the multiple measurement vector (MMV) sparse signal recovery problem in a robust manner when data contains outlier points which do not fit the shared sparsity structure otherwise contained in the data. This scenario occurs frequently in the applications of MMV models due to only partially known source dynamics. The algorithm we propose is a modification of MMV-based sparse bayesian learning (M-SBL) by incorporating the idea of least trimmed squares (LTS), which has previously been developed for robust linear regression. Experiments show a significant performance improvement over the conventional M-SBL under different outlier ratios and amplitudes.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123015697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Enhancing query expansion for semantic retrieval of spoken content with automatically discovered acoustic patterns 通过自动发现声学模式,增强口语内容语义检索的查询扩展
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-10-21 DOI: 10.1109/ICASSP.2013.6639283
Hung-yi Lee, Yun-Chiao Li, Cheng-Tao Chung, Lin-Shan Lee
{"title":"Enhancing query expansion for semantic retrieval of spoken content with automatically discovered acoustic patterns","authors":"Hung-yi Lee, Yun-Chiao Li, Cheng-Tao Chung, Lin-Shan Lee","doi":"10.1109/ICASSP.2013.6639283","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6639283","url":null,"abstract":"Query expansion techniques were originally developed for text information retrieval in order to retrieve the documents not containing the query terms but semantically related to the query. This is achieved by assuming the terms frequently occurring in the top-ranked documents in the first-pass retrieval results to be query-related and using them to expand the query to do the second-pass retrieval. However, when this approach was used for spoken content retrieval, the inevitable recognition errors and the OOV problems in ASR make it difficult for many query-related terms to be included in the expanded query, and much of the information carried by the speech signal is lost during recognition and not recoverable. In this paper, we propose to use a second ASR engine based on acoustic patterns automatically discovered from the spoken archive used for retrieval. These acoustic patterns are discovered directly based on the signal characteristics, and therefore can compensate for the information lost during recognition to a good extent. When a text query is entered, the system generates the first-pass retrieval results based on the transcriptions of the spoken segments obtained via the conventional ASR. The acoustic patterns frequently occurring in the spoken segments ranked on top of the first-pass results are considered as query-related, and the spoken segments containing these query-related acoustic patterns are retrieved. In this way, even though some query-related terms are OOV or incorrectly recognized, the segments including these terms can still be retrieved by acoustic patterns corresponding to these terms. Preliminary experiments performed on Mandarin broadcast news offered very encouraging results.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121165186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Bayesian robust adaptive beamforming based on random steering vector with bingham prior distribution 基于bingham先验分布随机导向矢量的贝叶斯鲁棒自适应波束形成
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-10-21 DOI: 10.1109/ICASSP.2013.6638367
O. Besson, S. Bidon
{"title":"Bayesian robust adaptive beamforming based on random steering vector with bingham prior distribution","authors":"O. Besson, S. Bidon","doi":"10.1109/ICASSP.2013.6638367","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638367","url":null,"abstract":"We consider robust adaptive beamforming in the presence of steering vector uncertainties. A Bayesian approach is presented where the steering vector of interest is treated as a random vector with a Bingham prior distribution. Moreover, in order to also improve robustness against low sample support, the interference plus noise covariance matrix R is assigned a non informative prior distribution which enforces shrinkage to a scaled identity matrix, similarly to diagonal loading. The minimum mean square distance estimate of the steering vector as well as the minimum mean square error estimate of R are derived and implemented using a Gibbs sampling strategy. The new beamformer is shown to converge within a limited number of snapshots, despite the presence of steering vector errors.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134457717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A nonlinear dictionary for image reconstruction 一个用于图像重建的非线性字典
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-10-21 DOI: 10.1109/ICASSP.2013.6638052
Mathiruban Tharmalingam, K. Raahemifar
{"title":"A nonlinear dictionary for image reconstruction","authors":"Mathiruban Tharmalingam, K. Raahemifar","doi":"10.1109/ICASSP.2013.6638052","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638052","url":null,"abstract":"Complex signals such as images, audio and video recordings can be represented by a large over complete dictionary without distinguishable compromise on the representation quality. Large over complete dictionaries with more patterns can be used to increase the sparse coding as well as provide significant improvements in signal representation quality. The use of the over-complete dictionaries and sparse coding has been successfully applied in compression, de-noising, and pattern recognition applications within the last few decades. One particular dictionary, the Discrete Cosine Transform (DCT) dictionary has seen a great deal of success in image processing applications. However, we propose a novel non-linear over-complete dictionary that is sparser than the DCT dictionary while improving the quality of the signal representation. The proposed non-linear dictionary has demonstrated through experimental results to be superior to the DCT dictionary by achieving higher signal to noise ratio (SNR) in the reconstructed images.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133352518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Incoherent training of deep neural networks to de-correlate bottleneck features for speech recognition 语音识别中深度神经网络去相关瓶颈特征的非相干训练
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-10-21 DOI: 10.1109/ICASSP.2013.6639015
Y. Bao, Hui Jiang, Lirong Dai, Cong Liu
{"title":"Incoherent training of deep neural networks to de-correlate bottleneck features for speech recognition","authors":"Y. Bao, Hui Jiang, Lirong Dai, Cong Liu","doi":"10.1109/ICASSP.2013.6639015","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6639015","url":null,"abstract":"Recently, the hybrid model combining deep neural network (DNN) with context-dependent HMMs has achieved some dramatic gains over the conventional GMM/HMM method in many speech recognition tasks. In this paper, we study how to compete with the state-of-the-art DNN/HMM method under the traditional GMM/HMM framework. Instead of using DNN as acoustic model, we use DNN as a front-end bottleneck (BN) feature extraction method to decorrelate long feature vectors concatenated from several consecutive speech frames. More importantly, we have proposed two novel incoherent training methods to explicitly de-correlate BN features in learning of DNN. The first method relies on minimizing coherence of weight matrices in DNN while the second one attempts to minimize correlation coefficients of BN features calculated in each mini-batch data in DNN training. Experimental results on a 70-hr Mandarin transcription task and the 309-hr Switchboard task have shown that the traditional GMM/HMMs using BN features can yield comparable performance as DNN/HMM. The proposed incoherent training can produce 2-3% additional gain over the baseline BN features. At last, the discriminatively trained GMM/HMMs using incoherently trained BN features have consistently surpassed the state-of-the-art DNN/HMMs in all evaluated tasks.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129352683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Coverage and area spectral efficiency in downlink random cellular networks with channel estimation error 具有信道估计误差的下行随机蜂窝网络的覆盖和区域频谱效率
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-10-21 DOI: 10.1109/ICASSP.2013.6638492
Yueping Wu, M. Mckay, R. Heath
{"title":"Coverage and area spectral efficiency in downlink random cellular networks with channel estimation error","authors":"Yueping Wu, M. Mckay, R. Heath","doi":"10.1109/ICASSP.2013.6638492","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638492","url":null,"abstract":"We investigate the impact of channel estimation on the performance of downlink random cellular networks. First, we derive a new closed-form expression for the coverage probability under certain practical conditions. We show that the coverage probability is dependent on the user and base station (BS) densities solely through their ratio for arbitrary pilot-training length. Next, we derive the optimal pilot-training length that maximizes the area spectral efficiency (ASE) in several asymptotic regimes, and capture the dependence of this optimal length on the ratio between the user and BS densities. The ASE loss due to training is shown to be less significant in small cell networks with a larger base station density.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123829936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Voice activity detection based on frequency modulation of harmonics 基于谐波调频的语音活动检测
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-10-21 DOI: 10.1109/ICASSP.2013.6638954
Chung-Chien Hsu, Tse-En Lin, Jian-Hueng Chen, T. Chi
{"title":"Voice activity detection based on frequency modulation of harmonics","authors":"Chung-Chien Hsu, Tse-En Lin, Jian-Hueng Chen, T. Chi","doi":"10.1109/ICASSP.2013.6638954","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638954","url":null,"abstract":"In this paper, we propose a voice activity detection (VAD) algorithm based on spectro-temporal modulation structures of input sounds. A multi-resolution spectro-temporal analysis framework is used to inspect prominent speech structures. By comparing with an adaptive threshold, the proposed VAD distinguishes speech from non-speech based on the energy of the frequency modulation of harmonics. Compared with three standard VADs, ITU-T G.729B, ETSI AMR1 and AMR2, our proposed VAD significantly outperforms them in non-stationary noises in terms of the receiver operating characteristic (ROC) curves and the recognition rates from a practical distributed speech recognition (DSR) system.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126428944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Joint source-channel coding of 3D video using multiview coding 基于多视点编码的三维视频联合源信道编码
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-10-21 DOI: 10.1109/ICASSP.2013.6638014
Arash Vosoughi, Vanessa Testoni, P. Cosman, L. Milstein
{"title":"Joint source-channel coding of 3D video using multiview coding","authors":"Arash Vosoughi, Vanessa Testoni, P. Cosman, L. Milstein","doi":"10.1109/ICASSP.2013.6638014","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638014","url":null,"abstract":"We consider the joint source-channel coding problem of a 3D video transmitted over an AWGN channel. The goal is to minimize the total number of bits, which is the sum of the number of source bits and the number of forward error correction bits, under two constraints: the quality of the primary view and the quality of the secondary view must be greater than or equal to a predetermined threshold at the receiver. The quality is measured in terms of the expected PSNR of an entire decoded group of pictures. A MVC (multiview coding) encoder is used as the source encoder, and rate compatible punctured turbo codes are utilized for protection of the encoded 3D video over the noisy channel. Equal error protection and unequal error protection are compared for various 3D video sequences and noise levels.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126552722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信