2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献_第3页

Distributed primal strategies outperform primal-dual strategies over adaptive networks 分布式基本策略优于自适应网络上的基本双策略

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178621

Zaid J. Towfic, A. H. Sayed

引用次数: 1

Far-field speech recognition using CNN-DNN-HMM with convolution in time 基于时域卷积的CNN-DNN-HMM远场语音识别

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178794

Takuya Yoshioka, Shigeki Karita, T. Nakatani

引用次数: 34

Improving long short-term memory networks using maxout units for large vocabulary speech recognition 使用maxout单元改进长短期记忆网络用于大词汇量语音识别

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178842

Xiangang Li, Xihong Wu

引用次数: 18

Combining Compressed Sensing with motion correction in acquisition and reconstruction for PET/MR 压缩感知与运动校正在PET/MR图像采集与重建中的结合

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178077

Thomas Kustner, C. Würslin, H. Schmidt, Bin Yang

{"title":"Combining Compressed Sensing with motion correction in acquisition and reconstruction for PET/MR","authors":"Thomas Kustner, C. Würslin, H. Schmidt, Bin Yang","doi":"10.1109/ICASSP.2015.7178077","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178077","url":null,"abstract":"In the field of oncology, simultaneous Positron-Emission-Tomography/Magnetic Resonance (PET/MR) scanners offer a great potential for improving diagnostic accuracy. However, to achieve a high Signal-to-Noise Ratio (SNR) for an accurate lesion detection and quantification in the PET/MR images, one has to overcome the induced respiratory motion artifacts. The simultaneous acquisition allows performing a MR-based non-rigid motion correction of the PET data. It is essential to acquire a 4D (3D + time) motion model as accurate and fast as possible to minimize additional MR scan time overhead. Therefore, a Compressed Sensing (CS) acquisition by means of a variable-density Gaussian subsampling is employed to achieve high accelerations. Reformulating the sparse reconstruction as a combination of the inverse CS problem with a non-rigid motion correction improves the accuracy by alternately projecting the reconstruction results on either the motion-compensated CS reconstruction or on the motion model optimization. In-vivo patient data substantiates the diagnostic improvement.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125125481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Precoder and equalizer design for multi-user MIMO FBMC/OQAM with highly frequency selective channels 具有高频选择信道的多用户MIMO FBMC/OQAM预编码器和均衡器设计

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178407

Yao Cheng, L. Baltar, M. Haardt, J. Nossek

引用次数: 12

Feature enhancement based on generative-discriminative hybrid approach with gmms and DNNS for noise robust speech recognition 基于gmms和DNNS生成-判别混合方法的噪声鲁棒语音识别特征增强

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178926

M. Fujimoto, T. Nakatani

{"title":"Feature enhancement based on generative-discriminative hybrid approach with gmms and DNNS for noise robust speech recognition","authors":"M. Fujimoto, T. Nakatani","doi":"10.1109/ICASSP.2015.7178926","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178926","url":null,"abstract":"This paper presents a technique that combines generative and discriminative approaches with Gaussian mixture models (GMMs) and deep neural networks (DNNs) for model-based feature enhancement. Typical model-based feature enhancement employs a generative model approach. The enhanced features are obtained by using the weighted sum of linear transformations given by each Gaussian component contained in GMMs and corresponding posterior probabilities. The computation of posterior probabilities is a crucial factor for this kind of feature enhancement, and can also be formulated as the class discrimination problem of observed noisy features. The prominent discriminability of DNNs is a well-known solution to this discrimination problem. Therefore, we propose the use of DNNs for computing posterior probabilities. The proposed method incorporates the benefit of the discriminative approach into the generative approach. For AURORA2 task evaluations, the proposed method provided noticeable improvements compared with results obtained using the conventional generative model approach.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125894868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Vocal responses to frequency modulated composite sinewaves via auditory and vibrotactile pathways 通过听觉和振动触觉途径对频率调制的复合正弦波的声音反应

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178793

Xiaozhen Wang, K. Honda, J. Dang, Jianguo Wei

引用次数: 2

Dereverberation sweet spot dilation with combined channel equalization and beamforming 信道均衡和波束形成相结合的消噪甜蜜点扩展

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178069

Mark R. P. Thomas, H. Gamper, I. Tashev

引用次数: 0

A virtual resampling technique for algebraic two-dimensional phase unwrapping 代数二维相位展开的虚拟重采样技术

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178696

D. Kitahara, M. Yamagishi, I. Yamada

引用次数: 4

Intonational phrase break prediction for text-to-speech synthesis using dependency relations 基于依赖关系的文本-语音合成的语调断句预测

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2015-04-19 DOI: 10.1109/ICASSP.2015.7178906

Taniya Mishra, Yeon-Jun Kim, S. Bangalore

{"title":"Intonational phrase break prediction for text-to-speech synthesis using dependency relations","authors":"Taniya Mishra, Yeon-Jun Kim, S. Bangalore","doi":"10.1109/ICASSP.2015.7178906","DOIUrl":"https://doi.org/10.1109/ICASSP.2015.7178906","url":null,"abstract":"Intonational phrase (IP) break prediction is an important aspect of front-end analysis in a text-to-speech system. Standard approaches for intonational phrase break prediction rely on the use of linguistic rules or more recently, lexicalized data-driven models. Linguistic rules are not robust while data-driven models based on lexical identity do not generalize across domains. To overcome these challenges, in this paper, we explore the use of syntactic features to predict intonational phrase breaks. On a test set of over 40 thousand words, while a lexically driven IP break prediction model yields an F-score of 0.82, a non-lexicalized model that uses part-of-speech tags and dependency relations achieves an F-score of 0.81 with added feature of being more portable across domains. In this work, we also examine the effect of contextual information on prediction performance. Our evaluation shows that using a three-token left context in a POS-tag based model results in only a 2% drop in recall compared to a model that uses both a left and right context, which suggests the viability of using such a model for incremental text-to-speech system.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123286756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13