2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

筛选
英文 中文
Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks 卷积,长短期记忆,全连接深度神经网络
Tara N. Sainath, Oriol Vinyals, A. Senior, H. Sak
{"title":"Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks","authors":"Tara N. Sainath, Oriol Vinyals, A. Senior, H. Sak","doi":"10.1109/ICASSP.2015.7178838","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178838","url":null,"abstract":"Both Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) have shown improvements over Deep Neural Networks (DNNs) across a wide variety of speech recognition tasks. CNNs, LSTMs and DNNs are complementary in their modeling capabilities, as CNNs are good at reducing frequency variations, LSTMs are good at temporal modeling, and DNNs are appropriate for mapping features to a more separable space. In this paper, we take advantage of the complementarity of CNNs, LSTMs and DNNs by combining them into one unified architecture. We explore the proposed architecture, which we call CLDNN, on a variety of large vocabulary tasks, varying from 200 to 2,000 hours. We find that the CLDNN provides a 4-6% relative improvement in WER over an LSTM, the strongest of the three individual models.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128838306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1222
On transmit beamforming in MIMO radar with matrix completion 矩阵补全MIMO雷达发射波束形成研究
Shunqiao Sun, A. Petropulu
{"title":"On transmit beamforming in MIMO radar with matrix completion","authors":"Shunqiao Sun, A. Petropulu","doi":"10.1109/ICASSP.2015.7178476","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178476","url":null,"abstract":"The paper proposes a matrix completion based colocated MIMO radar (MIMO-MC) approach that employs transmit beamforming. The transmit antennas transmit correlated waveforms to illuminate certain directions. Each receive antenna performs sub-Nyquist sampling of the target returns at uniformly random times, and forwards the samples to a fusion center along with information on the sampling times. Based on the forwarded samples, the fusion center partially fills a matrix, recovers the Nyquist rate samples via matrix completion, and subsequently proceeds with target estimation via standard techniques. The performance of matrix completion depends on the matrix coherence. The paper derives the relations between transmit waveforms and matrix coherence. Specifically, it is shown that, for a rank-1 beamformer, the coherence is optimal, i.e., 1, if and only if the waveforms are unimodular. For a multi-rank beamformer, the coherence of the row space of the data matrix is optimal if the waveform power is constant across each snapshot. Simulation results show that the proposed scheme achieves high resolution with a significantly reduced number of samples.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"2192 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130115546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Wavelet-based compressed spectrum sensing for cognitive radio wireless networks 基于小波的认知无线网络压缩频谱感知
Hilmi E. Egilmez, Antonio Ortega
{"title":"Wavelet-based compressed spectrum sensing for cognitive radio wireless networks","authors":"Hilmi E. Egilmez, Antonio Ortega","doi":"10.1109/ICASSP.2015.7178553","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178553","url":null,"abstract":"Spectrum sensing is an essential functionality of cognitive radio wireless networks (CRWNs) that enables detecting unused frequency sub-bands for dynamic spectrum access. This paper proposes a compressed spectrum sensing framework by (i) constructing a sparsity basis in wavelet domain that helps compressed sensing at sub-Nyquist rates and (ii) applying a wavelet-based singularity detector on the reconstructed signal to identify available frequency sub-bands with low complexity. In particular, for the compressed sensing, an optimized Haar wavelet basis is employed to sparsely represent piecewise constant (PWC) signals which closely approximates the frequency spectrum of a sensed signal. Our simulation results show that our proposed framework outperforms existing compressed spectrum sensing methods by providing higher accuracy at lower sampling rates.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131453047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Coprime DFT filter bank design: Theoretical bounds and guarantees 素数DFT滤波器组设计:理论边界和保证
Chun-Lin Liu, P. Vaidyanathan
{"title":"Coprime DFT filter bank design: Theoretical bounds and guarantees","authors":"Chun-Lin Liu, P. Vaidyanathan","doi":"10.1109/ICASSP.2015.7178694","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178694","url":null,"abstract":"Coprime DFT filter banks (coprime DFTFB) achieve the effect of an MN-DFTFB by using two DFTFBs of size only M and N, where M and N are coprime integers. However, coprime DFTFBs need to be designed properly, to avoid unwanted bumps in stopbands or unsatisfactory total spectrum coverage, quantified by overall amplitude responses. In this paper, a detailed theoretical analysis will be made on the tradeoffs between bumps and overall amplitude responses. It will be shown that the bump level at the center frequency fb of a bump, is approximately one-fourth of the overall amplitude response at fb. Then, a novel design will be introduced based on an optimization problem pertaining to overall amplitude responses. The original problem is relaxed to a computationally tractable optimization program, which can be solved with alternating minimization algorithms. It is verified with simulations that the new designs cover the spectrum completely.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131736098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Low-complexity robust DOA estimation 低复杂度鲁棒DOA估计
B. Dumitrescu, Cristian Rusu, I. Tabus, J. Astola
{"title":"Low-complexity robust DOA estimation","authors":"B. Dumitrescu, Cristian Rusu, I. Tabus, J. Astola","doi":"10.1109/ICASSP.2015.7178480","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178480","url":null,"abstract":"We propose a low complexity method for estimating direction of arrival (DOA) when the positions of the array sensors are affected by errors with known magnitude bound. This robust DOA method is based on solving an optimization problem whose solution is obtained in two stages. First, the problem is relaxed and the corresponding power estimation has an expression similar to that of standard beamforming. If the relaxed solution does not satisfy the magnitude bound, an approximation is made by projection. Unlike other robust DOA methods, no eigenvalue decomposition is necessary and the complexity is similar to that of MVDR. For low and medium SNR, the proposed method competes well with more complex methods and is clearly better than MVDR.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115337767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compensating for asynchronies between musical voices in score-performance alignment 补偿音乐声音之间的不同步在乐谱-表演对齐
Siying Wang, Sebastian Ewert, S. Dixon
{"title":"Compensating for asynchronies between musical voices in score-performance alignment","authors":"Siying Wang, Sebastian Ewert, S. Dixon","doi":"10.1109/ICASSP.2015.7178037","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178037","url":null,"abstract":"The goal of score-performance synchronisation is to align a given musical score to an audio recording of a performance of the same piece. A major challenge in computing such alignments is to account for musical parameters including the local tempo or playing style. To increase the overall robustness, current methods assume that notes occurring simultaneously in the score are played concurrently in a performance. Musical voices such as the melody, however, are often played asynchronously to other voices, which can lead to significant local alignment errors. In this paper, we present a novel method that handles asynchronies between the melody and the accompaniment by treating the voices as separate time lines in a multi-dimensional variant of dynamic time warping (DTW). Constraining the alignment with information obtained via classical DTW, our method measurably improves the alignment accuracy for pieces with asynchronous voices and preserves the accuracy otherwise.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122831938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Sparse representation for frequency warping based voice conversion 基于频率扭曲的语音转换稀疏表示
Xiaohai Tian, Zhizheng Wu, Siu Wa Lee, Nguyen Quy Hy, Chng Eng Siong, M. Dong
{"title":"Sparse representation for frequency warping based voice conversion","authors":"Xiaohai Tian, Zhizheng Wu, Siu Wa Lee, Nguyen Quy Hy, Chng Eng Siong, M. Dong","doi":"10.1109/ICASSP.2015.7178769","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178769","url":null,"abstract":"This paper presents a sparse representation framework for weighted frequency warping based voice conversion. In this method, a frame-dependent warping function and the corresponding spectral residual vector are first calculated for each source-target spectrum pair. At runtime conversion, a source spectrum is factorised as a linear combination of a set of source spectra in the training data. The linear combination weight matrix, which is constrained to be sparse, is used to interpolate the frame-dependent warping functions and spectral residual vectors. In this way, the proposed method not only avoids the statistical averaging caused by GMM but also preserves the high-resolution spectral details for high-quality converted speech. Experiments are conducted on the VOICES database. Both objective and subjective results confirmed the effectiveness of the proposed method. In particular, the spectral distortion dropped from 5.55 dB of the conventional frequency warping approach to 5.0 dB of the proposed method. Compare to the state-of-the-art GMM-based conversion with global variance (GV) enhancement, our method achieved 68.5 % in an AB preference test.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116965432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
A gradient adaptive population importance sampler 梯度自适应种群重要性采样器
V. Elvira, Luca Martino, D. Luengo, J. Corander
{"title":"A gradient adaptive population importance sampler","authors":"V. Elvira, Luca Martino, D. Luengo, J. Corander","doi":"10.1109/ICASSP.2015.7178737","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178737","url":null,"abstract":"Monte Carlo (MC) methods are widely used in signal processing and machine learning. A well-known class of MC methods is composed of importance sampling and its adaptive extensions (e.g., population Monte Carlo). In this paper, we introduce an adaptive importance sampler using a population of proposal densities. The novel algorithm dynamically optimizes the cloud of proposals, adapting them using information about the gradient and Hessian matrix of the target distribution. Moreover, a new kind of interaction in the adaptation of the proposal densities is introduced, establishing a trade-off between attaining a good performance in terms of mean square error and robustness to initialization.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123088296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Structure discovery of deep neural network based on evolutionary algorithms 基于进化算法的深度神经网络结构发现
T. Shinozaki, Shinji Watanabe
{"title":"Structure discovery of deep neural network based on evolutionary algorithms","authors":"T. Shinozaki, Shinji Watanabe","doi":"10.1109/ICASSP.2015.7178918","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178918","url":null,"abstract":"Deep neural networks (DNNs) are constructed by considering highly complicated configurations including network structure and several tuning parameters (number of hidden states and learning rate in each layer), which greatly affect the performance of speech processing applications. To reach optimal performance in such systems, deep understanding and expertise in DNNs is necessary, which limits the development of DNN systems to skilled experts. To overcome the problem, this paper proposes an efficient optimization strategy for DNN structure and parameters using evolutionary algorithms. The proposed approach parametrizes the DNN structure by a directed acyclic graph, and the DNN structure is represented by a simple binary vector. Genetic algorithm and covariance matrix adaptation evolution strategy efficiently optimize the performance jointly with respect to the above binary vector and the other tuning parameters. Experiments on phoneme recognition and spoken digit detection tasks show the effectiveness of the proposed approach by discovering the appropriate DNN structure automatically.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127050616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Entropy analysis of i-vector feature spaces in duration-sensitive speaker recognition 时长敏感说话人识别中i向量特征空间的熵分析
A. Nautsch, C. Rathgeb, R. Saeidi, C. Busch
{"title":"Entropy analysis of i-vector feature spaces in duration-sensitive speaker recognition","authors":"A. Nautsch, C. Rathgeb, R. Saeidi, C. Busch","doi":"10.1109/ICASSP.2015.7178857","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178857","url":null,"abstract":"The vast majority of speaker recognition cross-entropy evaluations are focused on score domain. By examining the generalized relative distance between genuine and impostor sub-spaces, biometric characteristics become comparable to other authentication approaches. In this paper we demonstrate that the i-vector feature space's biometric information measured by relative entropy is comparable to e.g., knowledge-based mechanisms or face recognition. Examining NIST SRE 2004-2010 corpora, short samples of e.g, 5 seconds duration, comprise already 127 bits in a text-independent scenario. Further, the vast majority of short samples does not fall below 50% of the biometric information of samples having a duration of more than 40 seconds. The generalized i-vector feature space entropy of long samples corresponds to 182.1 bits, and the highest lower entropy bound of a subject was observed at 471.6 bits.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114967314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信