2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

筛选
英文 中文
Generalized Wiener filtering with fractional power spectrograms 分数阶功率谱的广义维纳滤波
A. Liutkus, R. Badeau
{"title":"Generalized Wiener filtering with fractional power spectrograms","authors":"A. Liutkus, R. Badeau","doi":"10.1109/ICASSP.2015.7177973","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7177973","url":null,"abstract":"In the recent years, many studies have focused on the single-sensor separation of independent waveforms using so-called soft-masking strategies, where the short term Fourier transform of the mixture is multiplied element-wise by a ratio of spectrogram models. When the signals are wide-sense stationary, this strategy is theoretically justified as an optimal Wiener filtering: the power spectrograms of the sources are supposed to add up to yield the power spectrogram of the mixture. However, experience shows that using fractional spectrograms instead, such as the amplitude, yields good performance in practice, because they experimentally better fit the additivity assumption. To the best of our knowledge, no probabilistic interpretation of this filtering procedure was available to date. In this paper, we show that assuming the additivity of fractional spectrograms for the purpose of building soft-masks can be understood as separating locally stationary α-stable harmonizable processes, α-harmonizable in short, thus justifying the procedure theoretically.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123441112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 93
Compressed sensing based multi-user millimeter wave systems: How many measurements are needed? 基于压缩感知的多用户毫米波系统:需要多少测量?
A. Alkhateeb, G. Leus, R. Heath
{"title":"Compressed sensing based multi-user millimeter wave systems: How many measurements are needed?","authors":"A. Alkhateeb, G. Leus, R. Heath","doi":"10.1109/ICASSP.2015.7178503","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178503","url":null,"abstract":"Millimeter wave (mmWave) systems will likely employ directional beamforming with large antenna arrays at both the transmitters and receivers. Acquiring channel knowledge to design these beamformers, however, is challenging due to the large antenna arrays and small signal-to-noise ratio before beamforming. In this paper, we propose and evaluate a downlink system operation for multi-user mmWave systems based on compressed sensing channel estimation and conjugate analog beamforming. Adopting the achievable sum-rate as a performance metric, we show how many compressed sensing measurements are needed to approach the perfect channel knowledge performance. The results illustrate that the proposed algorithm requires an order of magnitude less training overhead compared with traditional lower-frequency solutions, while employing mmWave-suitable hardware. They also show that the number of measurements need to be optimized to handle the trade-off between the channel estimate quality and the training overhead.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125354564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 241
Multi-frame factorisation for long-span acoustic modelling 大跨度声学建模的多帧分解
Liang Lu, S. Renals
{"title":"Multi-frame factorisation for long-span acoustic modelling","authors":"Liang Lu, S. Renals","doi":"10.1109/ICASSP.2015.7178841","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178841","url":null,"abstract":"Acoustic models based on Gaussian mixture models (GMMs) typically use short span acoustic feature inputs. This does not capture long-term temporal information from speech owing to the conditional independence assumption of hidden Markov models. In this paper, we present an implicit approach that approximates the joint distribution of long span features by product of factorized models, in contrast to deep neural networks (DNNs) that model feature correlations directly. The approach is applicable to a broad range of acoustic models. We present experiments using GMM and probabilistic linear discriminant analysis (PLDA) based models on Switchboard, observing consistent word error rate reductions.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126850364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Weak interference direction of arrival estimation in the GPS L1 frequency band GPS L1频段弱干扰到达方向估计
Zili Xu, M. Trinkle, D. Gray
{"title":"Weak interference direction of arrival estimation in the GPS L1 frequency band","authors":"Zili Xu, M. Trinkle, D. Gray","doi":"10.1109/ICASSP.2015.7178451","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178451","url":null,"abstract":"Due to its low received power, a GPS signal is vulnerable to both intentional and unintentional interferences. In this paper, the problem of estimating the direction of arrival of a weak GPS interference, which has the same power level as the GPS signals or is even weaker than them, using a GPS antenna array is considered. To achieve this, a multiple subspace projection algorithm is proposed to cancel GPS signals which are treated as relatively strong interfering sources. Comparisons with the Partitioned Subspace Projection (PSP) method are presented using simulations. Experimental results show that the DOA of an interference with an SNR of -20dB in the GPS L1 band can be accurately estimated1.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"202 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115024696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A deep neural network approach to speech bandwidth expansion 语音带宽扩展的深度神经网络方法
Kehuang Li, Chin-Hui Lee
{"title":"A deep neural network approach to speech bandwidth expansion","authors":"Kehuang Li, Chin-Hui Lee","doi":"10.1109/ICASSP.2015.7178801","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178801","url":null,"abstract":"We propose a deep neural network (DNN) approach to speech bandwidth expansion (BWE) by estimating the spectral mapping function from narrowband (4 kHz in bandwidth) to wideband (8 kHz in bandwidth). Log-spectrum power is used as the input and output features to perform the required nonlinear transformation, and DNNs are trained to realize this high-dimensional mapping function. When evaluating the proposed approach on a large-scale 10-hour test set, we found that the DNN-expanded speech signals give excellent objective quality measures in terms of segmental signal-to-noise ratio and log-spectral distortion when compared with conventional BWE based on Gaussian mixture models (GMMs). Subjective listening tests also give a 69% preference score for DNN-expanded speech over 31% for GMM when the phase information is assumed known. For tests in real operation when the phase information is imaged from the given narrowband signal the preference comparison goes up to 84% versus 16%. A correct phase recovery can further increase the BWE performance for the proposed DNN method.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116041949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 114
A new study of GMM-SVM system for text-dependent speaker recognition 基于GMM-SVM的文本依赖说话人识别新研究
Hanwu Sun, Kong-Aik Lee, B. Ma
{"title":"A new study of GMM-SVM system for text-dependent speaker recognition","authors":"Hanwu Sun, Kong-Aik Lee, B. Ma","doi":"10.1109/ICASSP.2015.7178761","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178761","url":null,"abstract":"This paper presents a new approach and the study of GMM-SVM system for text-dependent speaker recognition on scenario of the fixed pass-phrases. The uniform-split content-based GMM-SVM system is proposed and applied to text-dependent speaker evaluation. We conducted detailed study of the proposed method compared to the baseline GMM-SVM system on the RSR2015 database, which has been designed and collected for the evaluation of text-dependent speaker verification system. The experiment results show that the new approach can significantly reduce the detection error of the target-wrong error type (i.e., target speaker with wrong pass-phrase) while maintaining a low detection error for both imposter-correct and imposter-wrong error types (i.e., imposter with correct pass-phrase and imposter with wrong pass-phrase). We also show that score normalization could be applied with respect to the imposter-wrong distribution as opposed to the imposter-correct distribution.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116388287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Selective hole-filling for depth-image based rendering 基于深度图像渲染的选择性孔填充
Adriano Q. de Oliveira, Guilherme P. Fickel, M. Walter, C. Jung
{"title":"Selective hole-filling for depth-image based rendering","authors":"Adriano Q. de Oliveira, Guilherme P. Fickel, M. Walter, C. Jung","doi":"10.1109/ICASSP.2015.7178157","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178157","url":null,"abstract":"One of the biggest challenges in view interpolation is to fill the regions without projective information in the synthesized view. In this paper, we present a new approach that identifies and corrects different types of missing information. In the first stage, we propose a fast solution to tackle the problems of cracks and ghost, common artifacts in the view interpolation process. Then, we complete larger holes by exploring the disparity map as an additional cue to select the best patch in a patch-based inpainting procedure. Our experimental results indicate that we were able to outperform current state of the art hole filling techniques for view interpolation.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116390474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
An investigation of augmenting speaker representations to improve speaker normalisation for DNN-based speech recognition 基于dnn的语音识别中增强说话人表示改进说话人归一化的研究
Hengguan Huang, K. Sim
{"title":"An investigation of augmenting speaker representations to improve speaker normalisation for DNN-based speech recognition","authors":"Hengguan Huang, K. Sim","doi":"10.1109/ICASSP.2015.7178844","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178844","url":null,"abstract":"The conventional short-term interval features used by the Deep Neural Networks (DNNs) lack the ability to learn longer term information. This poses a challenge for training a speaker-independent (SI) DNN since the short-term features do not provide sufficient information for the DNN to estimate the real robust factors of speaker-level variations. The key to this problem is to obtain a sufficiently robust and informative speaker representation. This paper compares several speaker representations. Firstly, a DNN speaker classifier is used to extract the bottleneck features as the speaker representation, called the Bottleneck Speaker Vector (BSV). To further improve the robustness of this representation, a first-order Bottleneck Speaker Super Vector (BSSV) is also proposed, where the BSV is expanded into a super vector space by incorporating the phoneme posterior probabilities. Finally, a more fine-grain speaker representation based on the FMLLR-shifted features is examined. The experimental results on the WSJ0 and WSJ1 datasets show that the proposed speaker representations are useful in normalising the speaker effects for robust DNN-based automatic speech recognition. The best performance is achieved by augmenting both the BSSV and the FMLLR-shifted representations, yielding 10.0% - 15.3% relatively performance gains over the SI DNN baseline.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116394729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Robust overlapped speech detection and its application in word-count estimation for Prof-Life-Log data 鲁棒重叠语音检测及其在Prof-Life-Log数据词数估计中的应用
Navid Shokouhi, A. Ziaei, A. Sangwan, J. Hansen
{"title":"Robust overlapped speech detection and its application in word-count estimation for Prof-Life-Log data","authors":"Navid Shokouhi, A. Ziaei, A. Sangwan, J. Hansen","doi":"10.1109/ICASSP.2015.7178867","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178867","url":null,"abstract":"The ability to estimate the number of words spoken by an individual over a certain period of time is valuable in second language acquisition, healthcare, and assessing language development. However, establishing a robust automatic framework to achieve high accuracy is non-trivial in realistic/naturalistic scenarios due to various factors such as different styles of conversation or types of noise that appear in audio recordings, especially in multi-party conversations. In this study, we propose a noise robust overlapped speech detection algorithm to estimate the likelihood of overlapping speech in a given audio file in the presence of environment noise. This information is embedded into a word-count estimator, which uses a linear minimum mean square estimator (LMMSE) to predict the number of words from the syllable rate. Syllables are detected using a modified version of the mrate algorithm. The proposed word-count estimator is tested on long duration files from the Prof-Life-Log corpus. Data is recorded using a LENA recording device, worn by a primary speaker in various environments and under different noise conditions. The overlap detection system significantly outperforms baseline performance in noisy conditions. Furthermore, applying overlap detection results to word-count estimation achieves 35% relative improvement over our previous efforts, which included speech enhancement using spectral subtraction and silence removal.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122385683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Binaural multichannel Wiener filter with directional interference rejection 具有方向性干扰抑制的双耳多通道维纳滤波器
E. Hadad, Daniel Marquardt, S. Doclo, S. Gannot
{"title":"Binaural multichannel Wiener filter with directional interference rejection","authors":"E. Hadad, Daniel Marquardt, S. Doclo, S. Gannot","doi":"10.1109/ICASSP.2015.7178048","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178048","url":null,"abstract":"In this paper we consider an acoustic scenario with a desired source and a directional interference picked up by hearing devices in a noisy and reverberant environment. We present an extension of the binaural multichannel Wiener filter (BMWF), by adding an interference rejection constraint to its cost function, in order to combine the advantages of spatial and spectral filtering while mitigating directional interferences. We prove that this algorithm can be decomposed into the binaural linearly constrained minimum variance (BLCMV) algorithm followed by a single channel Wiener post-filter. The proposed algorithm yields improved interference rejection capabilities, as compared with the BMWF. Moreover, by utilizing the spectral information on the sources, it is demonstrating better SNR measures, as compared with the BLCMV.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122711037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信