2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

筛选
英文 中文
Distributed primal strategies outperform primal-dual strategies over adaptive networks 分布式基本策略优于自适应网络上的基本双策略
Zaid J. Towfic, A. H. Sayed
{"title":"Distributed primal strategies outperform primal-dual strategies over adaptive networks","authors":"Zaid J. Towfic, A. H. Sayed","doi":"10.1109/ICASSP.2015.7178621","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178621","url":null,"abstract":"This work studies distributed primal-dual strategies for adaptation and learning over networks from streaming data. Two first-order methods are considered based on the Arrow-Hurwicz (AH) and augmented Lagrangian (AL) techniques. Several results are revealed in relation to the performance and stability of these strategies when employed over adaptive networks. It is found that these methods have worse steady-state mean-square-error performance than primal methods of the consensus and diffusion type. It is also found that the AH technique can become unstable under a partial observation model, while the other techniques are able to recover the unknown under this scenario. It is further shown that AL techniques are stable over a narrower range of step-sizes than primal strategies.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127161945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Far-field speech recognition using CNN-DNN-HMM with convolution in time 基于时域卷积的CNN-DNN-HMM远场语音识别
Takuya Yoshioka, Shigeki Karita, T. Nakatani
{"title":"Far-field speech recognition using CNN-DNN-HMM with convolution in time","authors":"Takuya Yoshioka, Shigeki Karita, T. Nakatani","doi":"10.1109/ICASSP.2015.7178794","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178794","url":null,"abstract":"Recent studies in speech recognition have shown that the performance of convolutional neural networks (CNNs) is superior to that of fully connected deep neural networks (DNNs). In this paper, we explore the use of CNNs in far-field speech recognition for dealing with reverberation, which blurs spectral energies along the time axis. Unlike most previous CNN applications to speech recognition, we consider convolution in time to examine whether it provides an improved reverberation modelling capability. Experimental results show that a CNN coupled with a fully connected DNN can model short time correlations in feature vectors with fewer parameters than a DNN and thus generalise better to unseen test environments. Combining this approach with signal-space dereverberation, which copes with long-term correlations, is shown to result in further improvement, where the gains from both approaches are almost additive. An initial investigation of the use of restricted convolution forms is also undertaken.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127162522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Improving long short-term memory networks using maxout units for large vocabulary speech recognition 使用maxout单元改进长短期记忆网络用于大词汇量语音识别
Xiangang Li, Xihong Wu
{"title":"Improving long short-term memory networks using maxout units for large vocabulary speech recognition","authors":"Xiangang Li, Xihong Wu","doi":"10.1109/ICASSP.2015.7178842","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178842","url":null,"abstract":"Long short-tem memory (LSTM) recurrent neural networks have been shown to give state-of-the-art performance on many speech recognition tasks. To achieve a further performance improvement, in this paper, maxout units are proposed to be integrated with the LSTM cells, considering those units have brought significant improvements to deep feed-forward neural networks. A novel architecture was constructed by replacing the input activation units (generally tanh) in the LSTM networks with maxout units. We implemented the LSTM network training on multi-GPU devices with truncated BPTT, and empirically evaluated the proposed designs on a large vocabulary Mandarin conversational telephone speech recognition task. The experimental results support our claim that the performance of LSTM based acoustic models can be further improved using the maxout units.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"181 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127315974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Combining Compressed Sensing with motion correction in acquisition and reconstruction for PET/MR 压缩感知与运动校正在PET/MR图像采集与重建中的结合
Thomas Kustner, C. Würslin, H. Schmidt, Bin Yang
{"title":"Combining Compressed Sensing with motion correction in acquisition and reconstruction for PET/MR","authors":"Thomas Kustner, C. Würslin, H. Schmidt, Bin Yang","doi":"10.1109/ICASSP.2015.7178077","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178077","url":null,"abstract":"In the field of oncology, simultaneous Positron-Emission-Tomography/Magnetic Resonance (PET/MR) scanners offer a great potential for improving diagnostic accuracy. However, to achieve a high Signal-to-Noise Ratio (SNR) for an accurate lesion detection and quantification in the PET/MR images, one has to overcome the induced respiratory motion artifacts. The simultaneous acquisition allows performing a MR-based non-rigid motion correction of the PET data. It is essential to acquire a 4D (3D + time) motion model as accurate and fast as possible to minimize additional MR scan time overhead. Therefore, a Compressed Sensing (CS) acquisition by means of a variable-density Gaussian subsampling is employed to achieve high accelerations. Reformulating the sparse reconstruction as a combination of the inverse CS problem with a non-rigid motion correction improves the accuracy by alternately projecting the reconstruction results on either the motion-compensated CS reconstruction or on the motion model optimization. In-vivo patient data substantiates the diagnostic improvement.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125125481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Precoder and equalizer design for multi-user MIMO FBMC/OQAM with highly frequency selective channels 具有高频选择信道的多用户MIMO FBMC/OQAM预编码器和均衡器设计
Yao Cheng, L. Baltar, M. Haardt, J. Nossek
{"title":"Precoder and equalizer design for multi-user MIMO FBMC/OQAM with highly frequency selective channels","authors":"Yao Cheng, L. Baltar, M. Haardt, J. Nossek","doi":"10.1109/ICASSP.2015.7178407","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178407","url":null,"abstract":"In this contribution we propose two new designs of transmit and receive processing for multi-user multiple-input-multiple-output (MIMO) downlink systems that employ filter bank based multicarrier with offset quadrature amplitude modulation (FBMC/OQAM). Our goal is to overcome the limits on the channel frequency selectivity and/or the allowed number of receive antennas per user terminal that are imposed on the state-of-the-art solutions. In the first method the design of precoders and equalizers is iterative and minimum mean square error (MMSE) based. The second is a closed-form design based on the signal-to-leakage ratio (SLR). Via numerical simulations we evaluate the performance of both methods and demonstrate their superiority over two other approaches in the literature.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125169035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Feature enhancement based on generative-discriminative hybrid approach with gmms and DNNS for noise robust speech recognition 基于gmms和DNNS生成-判别混合方法的噪声鲁棒语音识别特征增强
M. Fujimoto, T. Nakatani
{"title":"Feature enhancement based on generative-discriminative hybrid approach with gmms and DNNS for noise robust speech recognition","authors":"M. Fujimoto, T. Nakatani","doi":"10.1109/ICASSP.2015.7178926","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178926","url":null,"abstract":"This paper presents a technique that combines generative and discriminative approaches with Gaussian mixture models (GMMs) and deep neural networks (DNNs) for model-based feature enhancement. Typical model-based feature enhancement employs a generative model approach. The enhanced features are obtained by using the weighted sum of linear transformations given by each Gaussian component contained in GMMs and corresponding posterior probabilities. The computation of posterior probabilities is a crucial factor for this kind of feature enhancement, and can also be formulated as the class discrimination problem of observed noisy features. The prominent discriminability of DNNs is a well-known solution to this discrimination problem. Therefore, we propose the use of DNNs for computing posterior probabilities. The proposed method incorporates the benefit of the discriminative approach into the generative approach. For AURORA2 task evaluations, the proposed method provided noticeable improvements compared with results obtained using the conventional generative model approach.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125894868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Vocal responses to frequency modulated composite sinewaves via auditory and vibrotactile pathways 通过听觉和振动触觉途径对频率调制的复合正弦波的声音反应
Xiaozhen Wang, K. Honda, J. Dang, Jianguo Wei
{"title":"Vocal responses to frequency modulated composite sinewaves via auditory and vibrotactile pathways","authors":"Xiaozhen Wang, K. Honda, J. Dang, Jianguo Wei","doi":"10.1109/ICASSP.2015.7178793","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178793","url":null,"abstract":"Feedback control mechanisms for speaking have been examined using the transformed auditory feedback (TAF) technique. Previous studies have shown that speakers demonstrate fundamental frequency (F0) changes when they monitor their voice with artificial alterations of F0. However, those studies underestimate the role of vibrotactile information involved in feedback F0 control. This pilot study aims at exploring whether and how vibrotactile information from the larynx influences vowel F0. Participants in our experiment were asked to sustain vowel with their F0 adjusted to composite sinewave stimuli, which were given via auditory and vibrotactile channels using a headset on the ears or a bone-conduction transducer on the larynx. Results revealed the greater compensatory responses to combined vibrotactile-auditory stimuli than to the responses to auditory-only stimuli. The effect of vibrotactile stimuli on feedback F0 adjustment was also observed with the shorter latency of the responses.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126127027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Dereverberation sweet spot dilation with combined channel equalization and beamforming 信道均衡和波束形成相结合的消噪甜蜜点扩展
Mark R. P. Thomas, H. Gamper, I. Tashev
{"title":"Dereverberation sweet spot dilation with combined channel equalization and beamforming","authors":"Mark R. P. Thomas, H. Gamper, I. Tashev","doi":"10.1109/ICASSP.2015.7178069","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178069","url":null,"abstract":"Beamforming and channel equalizers can be formulated as optimal multichannel filter-and-sum operations with different objective criteria. It has been shown in previous studies that the combination of both concepts under a common framework can yield results that combine both the spatial robustness of beamforming and the dereverberation performance of channel equalization. This paper introduces an additional method for leveraging both approaches that exploits channel estimates in a wanted spatial location and derives robustness from knowledge of the array geometry alone. Experiments with an objective assessment of speech quality as a function of source perturbation reveal that the proposed technique can be viewed as a sweet spot dilator when compared with the MINT channel equalizer.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126139172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A virtual resampling technique for algebraic two-dimensional phase unwrapping 代数二维相位展开的虚拟重采样技术
D. Kitahara, M. Yamagishi, I. Yamada
{"title":"A virtual resampling technique for algebraic two-dimensional phase unwrapping","authors":"D. Kitahara, M. Yamagishi, I. Yamada","doi":"10.1109/ICASSP.2015.7178696","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178696","url":null,"abstract":"Two-dimensional (2D) phase unwrapping is a reconstruction problem of a continuous phase, defined over 2D-domain, from its wrapped samples. In our previous work, we presented a two-step phase unwrapping algorithm which first constructs, as the real and imaginary parts of a complex function, a pair of piecewise polynomials having no common zero over the domain, then estimates the unwrapped phase by applying the algebraic phase unwrapping. In this paper, we propose a preprocessing of the above algorithm for avoiding the appearance of zeros of the complex function in the first step. The proposed preprocessing is implemented by a convex optimization and resampling, and its effectiveness is shown in a terrain height estimation by the interferometric synthetic aperture radar.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126153198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Intonational phrase break prediction for text-to-speech synthesis using dependency relations 基于依赖关系的文本-语音合成的语调断句预测
Taniya Mishra, Yeon-Jun Kim, S. Bangalore
{"title":"Intonational phrase break prediction for text-to-speech synthesis using dependency relations","authors":"Taniya Mishra, Yeon-Jun Kim, S. Bangalore","doi":"10.1109/ICASSP.2015.7178906","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178906","url":null,"abstract":"Intonational phrase (IP) break prediction is an important aspect of front-end analysis in a text-to-speech system. Standard approaches for intonational phrase break prediction rely on the use of linguistic rules or more recently, lexicalized data-driven models. Linguistic rules are not robust while data-driven models based on lexical identity do not generalize across domains. To overcome these challenges, in this paper, we explore the use of syntactic features to predict intonational phrase breaks. On a test set of over 40 thousand words, while a lexically driven IP break prediction model yields an F-score of 0.82, a non-lexicalized model that uses part-of-speech tags and dependency relations achieves an F-score of 0.81 with added feature of being more portable across domains. In this work, we also examine the effect of contextual information on prediction performance. Our evaluation shows that using a three-token left context in a POS-tag based model results in only a 2% drop in recall compared to a model that uses both a left and right context, which suggests the viability of using such a model for incremental text-to-speech system.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123286756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信