ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献_第5页

DP-DWA: Dual-Path Dynamic Weight Attention Network With Streaming Dfsmn-San For Automatic Speech Recognition DP-DWA:基于流式Dfsmn-San的自动语音识别双路径动态权重注意网络

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2022-05-23 DOI: 10.1109/icassp43922.2022.9746328

Dongpeng Ma, Yiwen Wang, Liqiang He, Mingjie Jin, Dan Su, Dong Yu

{"title":"DP-DWA: Dual-Path Dynamic Weight Attention Network With Streaming Dfsmn-San For Automatic Speech Recognition","authors":"Dongpeng Ma, Yiwen Wang, Liqiang He, Mingjie Jin, Dan Su, Dong Yu","doi":"10.1109/icassp43922.2022.9746328","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746328","url":null,"abstract":"In multi-channel far-field automatic speech recognition (ASR) scenarios, distortion is introduced when the speech signal is processed by the front end, which damages the recognition performance for the ASR tasks. In this paper, we propose a dual-path network for the far-field acoustic model, which uses voice processing (VP) signal and acoustic echo cancellation (AEC) signal as input. Specifically, we design a dynamic weight attention (DWA) module for combining two signals. Besides, we streamline our best deep feed-forward sequential memory network with self-attention (DFSMN-SAN) acoustic model for real-time requirements. Joint-training strategy is adopted to optimize the proposed approach. We find that with dual-path network, we can achieve a 54.5% relative improvement in character error rate (CER) on a 10,000-hour online conference task. In addition, our proposed method is not affected by the arrangement of different microphone arrays. We achieve a 23.56% relative improvement on a vehicle task, which has an array with two microphones.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117308879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

FRE-GAN 2: Fast and Efficient Frequency-Consistent Audio Synthesis 快速高效的频率一致音频合成

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2022-05-23 DOI: 10.1109/icassp43922.2022.9746675

Sang-Hoon Lee, Ji-Hoon Kim, Kangeun Lee, Seong-Whan Lee

引用次数: 4

Local Context Interaction-Aware Glyph-Vectors for Chinese Sequence Tagging 中文序列标注的局部上下文交互感知符号向量

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2022-05-23 DOI: 10.1109/icassp43922.2022.9747303

Junyu Lu, Pingjian Zhang

引用次数: 1

Detection of Covid-19 from Joint Time and Frequency Analysis of Speech, Breathing and Cough Audio 语音、呼吸和咳嗽音频时频联合分析检测Covid-19

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2022-05-23 DOI: 10.1109/icassp43922.2022.9746015

John Harvill, Yash R. Wani, Moitreya Chatterjee, M. Alam, D. Beiser, David Chestek, M. Hasegawa-Johnson, N. Ahuja

引用次数: 2

Learning Monocular 3D Human Pose Estimation With Skeletal Interpolation 学习单目3D人体姿态估计与骨骼插值

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2022-05-23 DOI: 10.1109/icassp43922.2022.9746410

Ziyi Chen, A. Sugimoto, S. Lai

引用次数: 1

Regularization Using Denoising: Exact and Robust Signal Recovery 使用去噪的正则化:精确和鲁棒的信号恢复

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2022-05-23 DOI: 10.1109/ICASSP43922.2022.9747396

Ruturaj G. Gavaskar, K. Chaudhury

{"title":"Regularization Using Denoising: Exact and Robust Signal Recovery","authors":"Ruturaj G. Gavaskar, K. Chaudhury","doi":"10.1109/ICASSP43922.2022.9747396","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9747396","url":null,"abstract":"We consider the problem of signal reconstruction from linearly corrupted data using plug-and-play (PnP) regularization. As opposed to traditional sparsity-promoting regularizers, PnP uses an off-the-shelf denoiser within a proximal algorithm such as ISTA or ADMM for image reconstruction. Although PnP has become popular in the imaging community, its regularization capacity is not fully understood. For example, it is not known if PnP can in theory recover a signal from few noiseless measurements as in classical compressed sensing and if the recovery is robust. We explore these questions in this work and present some theoretical and experimental results. In particular, we prove that if the denoiser in question has low rank and if the ground- truth lies in the range of the denoiser, then it can be recovered exactly from noiseless measurements. To the best of knowledge, this is first such result. Furthermore, we show using numerical simulations that even if the aforementioned conditions are violated, PnP recovery is robust in practice. We formulate a theorem regarding the recovery error based on these observations.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129090664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Towards Interpreting Deep Learning Models to Understand Loss of Speech Intelligibility in Speech Disorders Step 2: Contribution of the Emergence of Phonetic Traits 解释深度学习模型以理解语言障碍中语言可理解性的丧失:语音特征出现的贡献

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2022-05-23 DOI: 10.1109/icassp43922.2022.9746198

Sondes Abderrazek, C. Fredouille, A. Ghio, M. Lalain, Christine Meunier, V. Woisard

{"title":"Towards Interpreting Deep Learning Models to Understand Loss of Speech Intelligibility in Speech Disorders Step 2: Contribution of the Emergence of Phonetic Traits","authors":"Sondes Abderrazek, C. Fredouille, A. Ghio, M. Lalain, Christine Meunier, V. Woisard","doi":"10.1109/icassp43922.2022.9746198","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746198","url":null,"abstract":"Apart from the impressive performance it has achieved in several tasks, one of the most important factors remaining for the continuous progress of deep learning is the increased work related to interpretability, especially in a medical context. In a recent work, we presented competitive performance achieved with a CNN-based model trained on normal speech for the French phone classification and how it correlates well with different perceptual measures when exposed to disordered speech. This paper extends that work by focusing on interpretability. Here, the goal is to get insights into the way in which neural representations shape the final task of phone classification so that it can be used further to explain the loss of intelligibility in disordered speech. In this way, an original framework is proposed, relying firstly on the neural activity and a novel representation per neuron, here considering the phone classification, and, secondly, permitting to identify a set of neurons devoted to the detection of specific phonetic traits on normal speech. Faced to disordered speech, a degradation of that set of neurons is observed, demonstrating a loss of specific phonetic traits in some patients involved, and the potentiality of the proposed approaches to inform about speech alteration.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124596690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Distributed Particle Filters for State Tracking on the Stiefel Manifold Using Tangent Space Statistics 基于切线空间统计的Stiefel流形状态跟踪的分布粒子滤波

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2022-05-23 DOI: 10.1109/icassp43922.2022.9746305

C. Bordin, Caio Gomes de Figueredo, Marcelo G. S. Bruno

引用次数: 3

Training Strategies for Automatic Song Writing: A Unified Framework Perspective 统一框架视角下的自动写歌训练策略

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2022-05-23 DOI: 10.1109/icassp43922.2022.9746818

Tao Qian, Jiatong Shi, Shuai Guo, Peter Wu, Qin Jin

{"title":"Training Strategies for Automatic Song Writing: A Unified Framework Perspective","authors":"Tao Qian, Jiatong Shi, Shuai Guo, Peter Wu, Qin Jin","doi":"10.1109/icassp43922.2022.9746818","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746818","url":null,"abstract":"Automatic song writing (ASW) typically involves four tasks: lyric-to-lyric generation, melody-to-melody generation, lyric-to-melody generation, and melody-to-lyric generation. Previous works have mainly focused on individual tasks without considering the correlation between them, and thus a unified framework to solve all four tasks has not yet been explored. In this paper, we propose a unified framework following the pre-training and fine-tuning paradigm to address all four ASW tasks with one model. To alleviate the data scarcity issue of paired lyric-melody data for lyric-to-melody and melody-to-lyric generation, we adopt two pre-training stages with unpaired data. In addition, we introduce a dual transformation loss to fully utilize paired data in the fine-tuning stage to enforce the weak correlation between melody and lyrics. We also design an objective music generation evaluation metric involving the chromatic rule and a more realistic setting, which removes some strict assumptions adopted in previous works. To the best of our knowledge, this work is the first to explore ASW for pop songs in Chinese. Extensive experiments demonstrate the effectiveness of the dual transformation loss and the unified model structure encompassing all four tasks. The experimental results also show that our proposed new evaluation metric aligns better with subjective opinion scores from human listeners.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130388911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Vision Transformer-Based Retina Vessel Segmentation with Deep Adaptive Gamma Correction 基于视觉变换的视网膜血管分割与深度自适应伽玛校正

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2022-05-23 DOI: 10.1109/icassp43922.2022.9747597

Hyunwoo Yu, J. Shim, Jaeho Kwak, J. Song, Suk-Ju Kang

{"title":"Vision Transformer-Based Retina Vessel Segmentation with Deep Adaptive Gamma Correction","authors":"Hyunwoo Yu, J. Shim, Jaeho Kwak, J. Song, Suk-Ju Kang","doi":"10.1109/icassp43922.2022.9747597","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747597","url":null,"abstract":"Accurate segmentation of the retina vessel is essential for the early diagnosis of eye-related diseases. Recently, convolutional neural networks have shown remarkable performance in retina vessel segmentation. However, the complexity of edge structural information and the changeable intensity distribution depending on retina images reduce the performance of the segmentation tasks. This paper proposes two novel deep learning-based modules, channel attention vision transformer (CAViT) and deep adaptive gamma correction (DAGC), to tackle these issues. The CAViT jointly applies the efficient channel attention (ECA) and the vision transformer (ViT), in which the channel attention module considers the interdependency among feature channels and the ViT discriminates meaningful edge structures by considering the global context. The DAGC module provides the optimal gamma correction value for each input image by jointly training a CNN model with the segmentation network so that all the retina images are mapped to a unified intensity distribution. The experimental results show that our proposed method achieves superior performance compared to conventional methods on widely used datasets, DRIVE and CHASE DB1.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127040555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5