2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献_第4页

Generalized Wiener filtering with fractional power spectrograms 分数阶功率谱的广义维纳滤波

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7177973

A. Liutkus, R. Badeau

引用次数: 93

Compressed sensing based multi-user millimeter wave systems: How many measurements are needed? 基于压缩感知的多用户毫米波系统:需要多少测量?

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178503

A. Alkhateeb, G. Leus, R. Heath

引用次数: 241

Multi-frame factorisation for long-span acoustic modelling 大跨度声学建模的多帧分解

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178841

Liang Lu, S. Renals

引用次数: 0

Weak interference direction of arrival estimation in the GPS L1 frequency band GPS L1频段弱干扰到达方向估计

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178451

Zili Xu, M. Trinkle, D. Gray

引用次数: 6

A deep neural network approach to speech bandwidth expansion 语音带宽扩展的深度神经网络方法

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178801

Kehuang Li, Chin-Hui Lee

引用次数: 114

A new study of GMM-SVM system for text-dependent speaker recognition 基于GMM-SVM的文本依赖说话人识别新研究

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178761

Hanwu Sun, Kong-Aik Lee, B. Ma

引用次数: 10

Selective hole-filling for depth-image based rendering 基于深度图像渲染的选择性孔填充

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178157

Adriano Q. de Oliveira, Guilherme P. Fickel, M. Walter, C. Jung

引用次数: 22

An investigation of augmenting speaker representations to improve speaker normalisation for DNN-based speech recognition 基于dnn的语音识别中增强说话人表示改进说话人归一化的研究

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178844

Hengguan Huang, K. Sim

{"title":"An investigation of augmenting speaker representations to improve speaker normalisation for DNN-based speech recognition","authors":"Hengguan Huang, K. Sim","doi":"10.1109/ICASSP.2015.7178844","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178844","url":null,"abstract":"The conventional short-term interval features used by the Deep Neural Networks (DNNs) lack the ability to learn longer term information. This poses a challenge for training a speaker-independent (SI) DNN since the short-term features do not provide sufficient information for the DNN to estimate the real robust factors of speaker-level variations. The key to this problem is to obtain a sufficiently robust and informative speaker representation. This paper compares several speaker representations. Firstly, a DNN speaker classifier is used to extract the bottleneck features as the speaker representation, called the Bottleneck Speaker Vector (BSV). To further improve the robustness of this representation, a first-order Bottleneck Speaker Super Vector (BSSV) is also proposed, where the BSV is expanded into a super vector space by incorporating the phoneme posterior probabilities. Finally, a more fine-grain speaker representation based on the FMLLR-shifted features is examined. The experimental results on the WSJ0 and WSJ1 datasets show that the proposed speaker representations are useful in normalising the speaker effects for robust DNN-based automatic speech recognition. The best performance is achieved by augmenting both the BSSV and the FMLLR-shifted representations, yielding 10.0% - 15.3% relatively performance gains over the SI DNN baseline.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116394729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 53

Robust overlapped speech detection and its application in word-count estimation for Prof-Life-Log data 鲁棒重叠语音检测及其在Prof-Life-Log数据词数估计中的应用

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178867

Navid Shokouhi, A. Ziaei, A. Sangwan, J. Hansen

{"title":"Robust overlapped speech detection and its application in word-count estimation for Prof-Life-Log data","authors":"Navid Shokouhi, A. Ziaei, A. Sangwan, J. Hansen","doi":"10.1109/ICASSP.2015.7178867","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178867","url":null,"abstract":"The ability to estimate the number of words spoken by an individual over a certain period of time is valuable in second language acquisition, healthcare, and assessing language development. However, establishing a robust automatic framework to achieve high accuracy is non-trivial in realistic/naturalistic scenarios due to various factors such as different styles of conversation or types of noise that appear in audio recordings, especially in multi-party conversations. In this study, we propose a noise robust overlapped speech detection algorithm to estimate the likelihood of overlapping speech in a given audio file in the presence of environment noise. This information is embedded into a word-count estimator, which uses a linear minimum mean square estimator (LMMSE) to predict the number of words from the syllable rate. Syllables are detected using a modified version of the mrate algorithm. The proposed word-count estimator is tested on long duration files from the Prof-Life-Log corpus. Data is recorded using a LENA recording device, worn by a primary speaker in various environments and under different noise conditions. The overlap detection system significantly outperforms baseline performance in noisy conditions. Furthermore, applying overlap detection results to word-count estimation achieves 35% relative improvement over our previous efforts, which included speech enhancement using spectral subtraction and silence removal.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122385683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Binaural multichannel Wiener filter with directional interference rejection 具有方向性干扰抑制的双耳多通道维纳滤波器

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178048

E. Hadad, Daniel Marquardt, S. Doclo, S. Gannot

引用次数: 9