{"title":"Regularized Adaboost for content identification","authors":"Honghai Yu, P. Moulin","doi":"10.1109/ICASSP.2013.6638224","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638224","url":null,"abstract":"This paper proposes a regularized Adaboost learning algorithm to extract binary fingerprints by filtering and quantizing perceptually significant features. The proposed algorithm extends the recent symmetric pairwise boosting (SPB) algorithm by taking feature sequence correlation into account. Information and learning theoretic analysis is given. Significant performance gains over SPB are demonstrated for both audio and video fingerprinting.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129337444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust low-complexity multichannel equalization for dereverberation","authors":"Felicia Lim, P. Naylor","doi":"10.1109/ICASSP.2013.6637736","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6637736","url":null,"abstract":"Multichannel equalization of acoustic impulse responses (AIRs) is an important approach for dereverberation. Since AIRs are inevitably estimated with system identification error (SIE), it is necessary to develop equalization designs that are robust to such SIE, in order for dereverberation processing to be beneficial. We present here a novel subband equalizer employing the relaxed multichannel least squares (RMCLS) algorithm in each subband. We show that this new structure brings improved performance in dereverberation as well as a reduction in computational load by up to a factor of more than 90 in our experiments. We then develop a novel controller for the dereverberation processing in subbands that guarantees robustness to even very severe SIEs by backing off dereverberation in any subband with excessively high levels of SIEs.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124991510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust joint sparse recovery on data with outliers","authors":"Ozgur Balkan, K. Kreutz-Delgado, S. Makeig","doi":"10.1109/ICASSP.2013.6638373","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638373","url":null,"abstract":"We propose a method to solve the multiple measurement vector (MMV) sparse signal recovery problem in a robust manner when data contains outlier points which do not fit the shared sparsity structure otherwise contained in the data. This scenario occurs frequently in the applications of MMV models due to only partially known source dynamics. The algorithm we propose is a modification of MMV-based sparse bayesian learning (M-SBL) by incorporating the idea of least trimmed squares (LTS), which has previously been developed for robust linear regression. Experiments show a significant performance improvement over the conventional M-SBL under different outlier ratios and amplitudes.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123015697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hung-yi Lee, Yun-Chiao Li, Cheng-Tao Chung, Lin-Shan Lee
{"title":"Enhancing query expansion for semantic retrieval of spoken content with automatically discovered acoustic patterns","authors":"Hung-yi Lee, Yun-Chiao Li, Cheng-Tao Chung, Lin-Shan Lee","doi":"10.1109/ICASSP.2013.6639283","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6639283","url":null,"abstract":"Query expansion techniques were originally developed for text information retrieval in order to retrieve the documents not containing the query terms but semantically related to the query. This is achieved by assuming the terms frequently occurring in the top-ranked documents in the first-pass retrieval results to be query-related and using them to expand the query to do the second-pass retrieval. However, when this approach was used for spoken content retrieval, the inevitable recognition errors and the OOV problems in ASR make it difficult for many query-related terms to be included in the expanded query, and much of the information carried by the speech signal is lost during recognition and not recoverable. In this paper, we propose to use a second ASR engine based on acoustic patterns automatically discovered from the spoken archive used for retrieval. These acoustic patterns are discovered directly based on the signal characteristics, and therefore can compensate for the information lost during recognition to a good extent. When a text query is entered, the system generates the first-pass retrieval results based on the transcriptions of the spoken segments obtained via the conventional ASR. The acoustic patterns frequently occurring in the spoken segments ranked on top of the first-pass results are considered as query-related, and the spoken segments containing these query-related acoustic patterns are retrieved. In this way, even though some query-related terms are OOV or incorrectly recognized, the segments including these terms can still be retrieved by acoustic patterns corresponding to these terms. Preliminary experiments performed on Mandarin broadcast news offered very encouraging results.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121165186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian robust adaptive beamforming based on random steering vector with bingham prior distribution","authors":"O. Besson, S. Bidon","doi":"10.1109/ICASSP.2013.6638367","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638367","url":null,"abstract":"We consider robust adaptive beamforming in the presence of steering vector uncertainties. A Bayesian approach is presented where the steering vector of interest is treated as a random vector with a Bingham prior distribution. Moreover, in order to also improve robustness against low sample support, the interference plus noise covariance matrix R is assigned a non informative prior distribution which enforces shrinkage to a scaled identity matrix, similarly to diagonal loading. The minimum mean square distance estimate of the steering vector as well as the minimum mean square error estimate of R are derived and implemented using a Gibbs sampling strategy. The new beamformer is shown to converge within a limited number of snapshots, despite the presence of steering vector errors.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134457717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A nonlinear dictionary for image reconstruction","authors":"Mathiruban Tharmalingam, K. Raahemifar","doi":"10.1109/ICASSP.2013.6638052","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638052","url":null,"abstract":"Complex signals such as images, audio and video recordings can be represented by a large over complete dictionary without distinguishable compromise on the representation quality. Large over complete dictionaries with more patterns can be used to increase the sparse coding as well as provide significant improvements in signal representation quality. The use of the over-complete dictionaries and sparse coding has been successfully applied in compression, de-noising, and pattern recognition applications within the last few decades. One particular dictionary, the Discrete Cosine Transform (DCT) dictionary has seen a great deal of success in image processing applications. However, we propose a novel non-linear over-complete dictionary that is sparser than the DCT dictionary while improving the quality of the signal representation. The proposed non-linear dictionary has demonstrated through experimental results to be superior to the DCT dictionary by achieving higher signal to noise ratio (SNR) in the reconstructed images.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133352518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incoherent training of deep neural networks to de-correlate bottleneck features for speech recognition","authors":"Y. Bao, Hui Jiang, Lirong Dai, Cong Liu","doi":"10.1109/ICASSP.2013.6639015","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6639015","url":null,"abstract":"Recently, the hybrid model combining deep neural network (DNN) with context-dependent HMMs has achieved some dramatic gains over the conventional GMM/HMM method in many speech recognition tasks. In this paper, we study how to compete with the state-of-the-art DNN/HMM method under the traditional GMM/HMM framework. Instead of using DNN as acoustic model, we use DNN as a front-end bottleneck (BN) feature extraction method to decorrelate long feature vectors concatenated from several consecutive speech frames. More importantly, we have proposed two novel incoherent training methods to explicitly de-correlate BN features in learning of DNN. The first method relies on minimizing coherence of weight matrices in DNN while the second one attempts to minimize correlation coefficients of BN features calculated in each mini-batch data in DNN training. Experimental results on a 70-hr Mandarin transcription task and the 309-hr Switchboard task have shown that the traditional GMM/HMMs using BN features can yield comparable performance as DNN/HMM. The proposed incoherent training can produce 2-3% additional gain over the baseline BN features. At last, the discriminatively trained GMM/HMMs using incoherently trained BN features have consistently surpassed the state-of-the-art DNN/HMMs in all evaluated tasks.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129352683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Coverage and area spectral efficiency in downlink random cellular networks with channel estimation error","authors":"Yueping Wu, M. Mckay, R. Heath","doi":"10.1109/ICASSP.2013.6638492","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638492","url":null,"abstract":"We investigate the impact of channel estimation on the performance of downlink random cellular networks. First, we derive a new closed-form expression for the coverage probability under certain practical conditions. We show that the coverage probability is dependent on the user and base station (BS) densities solely through their ratio for arbitrary pilot-training length. Next, we derive the optimal pilot-training length that maximizes the area spectral efficiency (ASE) in several asymptotic regimes, and capture the dependence of this optimal length on the ratio between the user and BS densities. The ASE loss due to training is shown to be less significant in small cell networks with a larger base station density.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123829936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chung-Chien Hsu, Tse-En Lin, Jian-Hueng Chen, T. Chi
{"title":"Voice activity detection based on frequency modulation of harmonics","authors":"Chung-Chien Hsu, Tse-En Lin, Jian-Hueng Chen, T. Chi","doi":"10.1109/ICASSP.2013.6638954","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638954","url":null,"abstract":"In this paper, we propose a voice activity detection (VAD) algorithm based on spectro-temporal modulation structures of input sounds. A multi-resolution spectro-temporal analysis framework is used to inspect prominent speech structures. By comparing with an adaptive threshold, the proposed VAD distinguishes speech from non-speech based on the energy of the frequency modulation of harmonics. Compared with three standard VADs, ITU-T G.729B, ETSI AMR1 and AMR2, our proposed VAD significantly outperforms them in non-stationary noises in terms of the receiver operating characteristic (ROC) curves and the recognition rates from a practical distributed speech recognition (DSR) system.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126428944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arash Vosoughi, Vanessa Testoni, P. Cosman, L. Milstein
{"title":"Joint source-channel coding of 3D video using multiview coding","authors":"Arash Vosoughi, Vanessa Testoni, P. Cosman, L. Milstein","doi":"10.1109/ICASSP.2013.6638014","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638014","url":null,"abstract":"We consider the joint source-channel coding problem of a 3D video transmitted over an AWGN channel. The goal is to minimize the total number of bits, which is the sum of the number of source bits and the number of forward error correction bits, under two constraints: the quality of the primary view and the quality of the secondary view must be greater than or equal to a predetermined threshold at the receiver. The quality is measured in terms of the expected PSNR of an entire decoded group of pictures. A MVC (multiview coding) encoder is used as the source encoder, and rate compatible punctured turbo codes are utilized for protection of the encoded 3D video over the noisy channel. Equal error protection and unequal error protection are compared for various 3D video sequences and noise levels.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126552722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}