2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

筛选
英文 中文
A multi-channel corpus for distant-speech interaction in presence of known interferences 一个多通道语料库,用于在已知干扰存在下的远程语音交互
E. Zwyssig, M. Ravanelli, P. Svaizer, M. Omologo
{"title":"A multi-channel corpus for distant-speech interaction in presence of known interferences","authors":"E. Zwyssig, M. Ravanelli, P. Svaizer, M. Omologo","doi":"10.1109/ICASSP.2015.7178818","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178818","url":null,"abstract":"This paper describes a new corpus of multi-channel audio data designed to study and develop distant-speech recognition systems able to cope with known interfering sounds propagating in an environment. The corpus consists of both real and simulated signals and of a corresponding detailed annotation. An extensive set of speech recognition experiments was conducted using three different Acoustic Echo Cancellation (AEC) techniques to establish baseline results for future reference. The AEC techniques were applied both to single distant microphone input signals and beamformed signals generated using two state-of-the-art beamforming techniques. We show that the speech recognition performance using the different techniques is comparable for both the simulated and real data, demonstrating the usefulness of this corpus for speech research. We also show that a significant improvement in speech recognition performance can be obtained by combining state-of-the-art AEC and beamforming techniques, compared to using a single distant microphone input.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129914303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Exemplar-based large vocabulary speech recognition using k-nearest neighbors 使用k近邻的基于范例的大词汇量语音识别
Yanbo Xu, O. Siohan, David Simcha, Sanjiv Kumar, H. Liao
{"title":"Exemplar-based large vocabulary speech recognition using k-nearest neighbors","authors":"Yanbo Xu, O. Siohan, David Simcha, Sanjiv Kumar, H. Liao","doi":"10.1109/ICASSP.2015.7178956","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178956","url":null,"abstract":"This paper describes a large scale exemplar-based acoustic modeling approach for large vocabulary continuous speech recognition. We construct an index of labeled training frames using high-level features extracted from the bottleneck layer of a deep neural network as indexing features. At recognition time, each test frame is turned into a query and a set of k-nearest neighbor frames is retrieved from the index. This set is further filtered using majority voting and the remaining frames are used to derive an estimate of the context-dependent state posteriors of the query, which can then be used for recognition. Using an approximate nearest neighbor search approach based on asymmetric hashing, we are able to construct an index on over 25,000 hours of training data. We present both frame classification and recognition experiments on a Voice Search task.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130510311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A virtual bass system with improved overflow control 一个虚拟低音系统与改进的溢出控制
Hao Mu, W. Gan
{"title":"A virtual bass system with improved overflow control","authors":"Hao Mu, W. Gan","doi":"10.1109/ICASSP.2015.7178289","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178289","url":null,"abstract":"The virtual bass system (VBS) can enhance the bass performance of small or flat loudspeakers by tricking the human brain to perceive the fundamental frequency from its higher harmonics. However, additional harmonics may lead to arithmetic overflow and cause distortion due to clipping, especially during high-level transient components. Past research pay little attention on this problem, and manual control of VBS gain settings is required to prevent overflow. Users need to manually adjust the gain settings for different sound tracks, which can be very troublesome. In this paper, we propose a VBS that can efficiently prevent the overflow problem by automatically controlling the gain settings for additional harmonics. This new approach pre-computes the gain limitation for additional harmonics and can be adopted for real-time audio implementation. Objective measurements are carried out to compare the proposed method with the commonly used limiter method.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126686513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Word embedding for recurrent neural network based TTS synthesis 基于递归神经网络的词嵌入TTS合成
Peilu Wang, Yao Qian, F. Soong, Lei He, Zhao Hai
{"title":"Word embedding for recurrent neural network based TTS synthesis","authors":"Peilu Wang, Yao Qian, F. Soong, Lei He, Zhao Hai","doi":"10.1109/ICASSP.2015.7178898","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178898","url":null,"abstract":"The current state of the art TTS synthesis can produce synthesized speech with highly decent quality if rich segmental and suprasegmental information are given. However, some suprasegmental features, e.g., Tone and Break (TOBI), are time consuming due to being manually labeled with a high inconsistency among different annotators. In this paper, we investigate the use of word embedding, which represents word with low dimensional continuous-valued vector and being assumed to carry a certain syntactic and semantic information, for bidirectional long short term memory (BLSTM), recurrent neural network (RNN) based TTS synthesis. Experimental results show that word embedding can significantly improve the performance of BLSTM-RNN based TTS synthesis without using features of TOBI and Part of Speech (POS).","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126801957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
An asymptotic LMPI test for cyclostationarity detection with application to cognitive radio 一种用于循环平稳性检测的渐近LMPI检验及其在认知无线电中的应用
D. Ramírez, P. Schreier, J. Vía, I. Santamaría, L. Scharf
{"title":"An asymptotic LMPI test for cyclostationarity detection with application to cognitive radio","authors":"D. Ramírez, P. Schreier, J. Vía, I. Santamaría, L. Scharf","doi":"10.1109/ICASSP.2015.7179057","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7179057","url":null,"abstract":"We propose a new detector of primary users in cognitive radio networks. The main novelty of the proposed detector in comparison to most known detectors is that it is based on sound statistical principles for detecting cyclostationary signals. In particular, the proposed detector is (asymptotically) the locally most powerful invariant test, i.e. the best invariant detector for low signal-to-noise ratios. The derivation is based on two main ideas: the relationship between a scalar-valued cyclostationary signal and a vector-valued wide-sense stationary signal, and Wijsman's theorem. Moreover, using the spectral representation for the cyclostationary time series, the detector has an insightful interpretation, and implementation, as the broadband coherence between frequencies that are separated by multiples of the cycle frequency. Finally, simulations confirm that the proposed detector performs better than previous approaches.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129174774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Relative group sparsity for non-negative matrix factorization with application to on-the-fly audio source separation 非负矩阵分解的相对群稀疏性及其在动态音频源分离中的应用
Dalia El Badawy, A. Ozerov, Ngoc Q. K. Duong
{"title":"Relative group sparsity for non-negative matrix factorization with application to on-the-fly audio source separation","authors":"Dalia El Badawy, A. Ozerov, Ngoc Q. K. Duong","doi":"10.1109/ICASSP.2015.7177971","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7177971","url":null,"abstract":"We consider dictionary-based signal decompositions with group sparsity, a variant of structured sparsity. We point out that the group sparsity-inducing constraint alone may not be sufficient in some cases when we know that some bigger groups or so-called supergroups cannot vanish completely. To deal with this problem we introduce the notion of relative group sparsity preventing the supergroups from vanishing. In this paper we formulate practical criteria and algorithms for relative group sparsity as applied to non-negative matrix factorization and investigate its potential benefit within the on-the-fly audio source separation framework we recently introduced. Experimental evaluation shows that the proposed relative group sparsity leads to performance improvement over group sparsity in both supervised and semi-supervised on-the-fly audio source separation settings.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123725530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
SNR maximization hashing for learning compact binary codes 学习紧凑二进制码的信噪比最大化哈希
Honghai Yu, P. Moulin
{"title":"SNR maximization hashing for learning compact binary codes","authors":"Honghai Yu, P. Moulin","doi":"10.1109/ICASSP.2015.7178259","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178259","url":null,"abstract":"In this paper, we propose a novel robust hashing algorithm based on signal-to-noise ratio (SNR) maximization to learn binary codes. We first motivate SNR maximization for robust hashing in a statistical model, under which maximizing SNR minimizes the robust hashing error probability. A globally optimal solution can be obtained by solving a generalized eigenvalue problem. The proposed algorithm is tested on both synthetic and real datasets, showing significant performance gain over existing hashing algorithms.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114541139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The proportional mean decomposition: A bridge between the Gaussian and bernoulli ensembles 比例平均分解:高斯系和伯努利系之间的桥梁
Samet Oymak, B. Hassibi
{"title":"The proportional mean decomposition: A bridge between the Gaussian and bernoulli ensembles","authors":"Samet Oymak, B. Hassibi","doi":"10.1109/ICASSP.2015.7178586","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178586","url":null,"abstract":"We consider ill-posed linear inverse problems involving the estimation of structured sparse signals. When the sensing matrix has i.i.d. standard normal entries, there is a full-fledged theory on the sample complexity and robustness properties. In this work, we propose a way of making use of this theory to get good bounds for the i.i.d. Bernoulli ensemble. We first provide a deterministic relation between the two ensembles that relates the restricted singular values. Then, we show how one can get non-asymptotic results with small constants for the Bernoulli ensemble. While our discussion focuses on Bernoulli measurements, the main idea can be extended to any discrete distribution with little difficulty.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116304281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Noise robust estimation of the voice source using a deep neural network 基于深度神经网络的声源噪声鲁棒估计
Manu Airaksinen, T. Raitio, P. Alku
{"title":"Noise robust estimation of the voice source using a deep neural network","authors":"Manu Airaksinen, T. Raitio, P. Alku","doi":"10.1109/ICASSP.2015.7178950","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178950","url":null,"abstract":"In the analysis of speech production, information about the voice source can be obtained non-invasively with glottal inverse filtering (GIF) methods. Current state-of-the-art GIF methods are capable of producing high-quality estimates in suitable conditions (e.g. low noise and reverberation), but their performance deteriorates in nonideal conditions because they require noise-sensitive parameter estimation. This study proposes a method for noise robust estimation of the voice source by creating a mapping using a deep neural network (DNN) between robust low-level speech features and the desired reference, a time-domain glottal flow computed by a GIF method. The method was evaluated with two GIF methods, of which one (quasi closed phase analysis, QCP) requires additional parameter estimation and the other (iterative adaptive inverse filtering, IAIF) does not. The results show that the proposed method outperforms the QCP method with SNRs less than 50-20 dB, but the simple IAIF method only with very low SNRs.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116344697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A language-based generative model framework for behavioral analysis of couples' therapy 基于语言的夫妻治疗行为分析生成模型框架
Sandeep Nallan Chakravarthula, Rahul Gupta, Brian R. Baucom, P. Georgiou
{"title":"A language-based generative model framework for behavioral analysis of couples' therapy","authors":"Sandeep Nallan Chakravarthula, Rahul Gupta, Brian R. Baucom, P. Georgiou","doi":"10.1109/ICASSP.2015.7178339","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178339","url":null,"abstract":"Observational studies for psychological evaluations rely on careful assessment of multiple behavioral cues. Recent studies have made good progress in automating the psychological evaluation, which often involved tedious manual annotation of a set of behavioral codes. However, the current methods impose strict and often unnatural assumptions for evaluation. In this work, we specifically investigate two goals: (1) Human behavior changes throughout an interaction and better models of this evolution can improve automated behavioral annotation and (2) Human perception of this evolution can be quite complex and non-linear and better techniques than averaging need to be investigated. For this purpose, we propose a Dynamic Behavior Modeling (DBM) scheme, which models a spouse as undergoing changes in behavioral state within a session, and contrast it against a Static Behavior Model (SBM) which allows only a constant session-long behavioral state. We use Negativity in a couples therapy task as our case study. We present results and analysis on both models for capturing the local behavior information and predicting the session level negativity label.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"2013 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121468164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信