2021 IEEE Spoken Language Technology Workshop (SLT)最新文献_第5页

Neural Mask based Multi-channel Convolutional Beamforming for Joint Dereverberation, Echo Cancellation and Denoising 基于神经掩模的多通道卷积波束形成联合去噪、回波抵消和去噪

2021 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2021-01-19 DOI: 10.1109/SLT48900.2021.9383519

Jianming Liu, Meng Yu, Yong Xu, Chao Weng, Shi-Xiong Zhang, Lianwu Chen, Dong Yu

引用次数: 3

Lightweight Voice Anonymization Based on Data-Driven Optimization of Cascaded Voice Modification Modules 基于级联语音修改模块数据驱动优化的轻量级语音匿名化

2021 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2021-01-19 DOI: 10.1109/SLT48900.2021.9383535

Hiroto Kai, Shinnosuke Takamichi, Sayaka Shiota, H. Kiya

{"title":"Lightweight Voice Anonymization Based on Data-Driven Optimization of Cascaded Voice Modification Modules","authors":"Hiroto Kai, Shinnosuke Takamichi, Sayaka Shiota, H. Kiya","doi":"10.1109/SLT48900.2021.9383535","DOIUrl":"https://doi.org/10.1109/SLT48900.2021.9383535","url":null,"abstract":"In this paper, we propose a voice anonymization framework based on data-driven optimization of cascaded voice modification modules. With increasing opportunities to use speech dialogue with machines nowadays, research regarding privacy protection of speaker information encapsulated in speech data is attracting attention. Anonymization, which is one of the methods for privacy protection, is based on signal processing manners, and the other one based on machine learning ones. Both approaches have a trade off between intelligibility of speech and degree of anonymization. The proposed voice anonymization framework utilizes advantages of machine learning and signal processing-based approaches to find the optimized trade off between the two. We use signal processing methods with training data for optimizing hyperparameters in a data-driven manner. The speech is modified using cascaded lightweight signal processing methods and then evaluated using black-box ASR and ASV, respectively. Our proposed method succeeded in deteriorating the speaker recognition rate by approximately 22% while simultaneously improved the speech recognition rate by over 3% compared to a signal processing-based conventional method.","PeriodicalId":243211,"journal":{"name":"2021 IEEE Spoken Language Technology Workshop (SLT)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115611591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

SLT 2021 Table of Contents SLT 2021目录

2021 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2021-01-19 DOI: 10.1109/slt48900.2021.9383586

引用次数: 0

End-To-End Lip Synchronisation Based on Pattern Classification 基于模式分类的端到端唇同步

2021 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2021-01-19 DOI: 10.1109/SLT48900.2021.9383616

You Jin Kim, Hee-Soo Heo, Soo-Whan Chung, Bong-Jin Lee

引用次数: 8

SLT 2021 Cover Page SLT 2021封面

2021 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2021-01-19 DOI: 10.1109/slt48900.2021.9383520

引用次数: 0

Multimodal Attention Fusion for Target Speaker Extraction 基于多模态注意力融合的目标说话人提取

2021 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2021-01-19 DOI: 10.1109/SLT48900.2021.9383539

Hiroshi Sato, Tsubasa Ochiai, K. Kinoshita, Marc Delcroix, T. Nakatani, S. Araki

{"title":"Multimodal Attention Fusion for Target Speaker Extraction","authors":"Hiroshi Sato, Tsubasa Ochiai, K. Kinoshita, Marc Delcroix, T. Nakatani, S. Araki","doi":"10.1109/SLT48900.2021.9383539","DOIUrl":"https://doi.org/10.1109/SLT48900.2021.9383539","url":null,"abstract":"Target speaker extraction, which aims at extracting a target speaker’s voice from a mixture of voices using audio, visual or locational clues, has received much interest. Recently an audio-visual target speaker extraction has been proposed that extracts target speech by using complementary audio and visual clues. Although audio-visual target speaker extraction offers a more stable performance than single modality methods for simulated data, its adaptation towards realistic situations has not been fully explored as well as evaluations on real recorded mixtures. One of the major issues to handle realistic situations is how to make the system robust to clue corruption because in real recordings both clues may not be equally reliable, e.g. visual clues may be affected by occlusions. In this work, we propose a novel attention mechanism for multi-modal fusion and its training methods that enable to effectively capture the reliability of the clues and weight the more reliable ones. Our proposals improve signal to distortion ratio (SDR) by 1.0 dB over conventional fusion mechanisms on simulated data. Moreover, we also record an audio-visual dataset of simultaneous speech with realistic visual clue corruption and show that audio-visual target speaker extraction with our proposals successfully work on real data.","PeriodicalId":243211,"journal":{"name":"2021 IEEE Spoken Language Technology Workshop (SLT)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133777402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

SLT 2021 Organizing Committee SLT 2021组委会

2021 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2021-01-19 DOI: 10.1109/slt48900.2021.9383566

引用次数: 0

Efficient corpus design for wake-word detection 高效的唤醒词检测语料库设计

2021 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2021-01-19 DOI: 10.1109/SLT48900.2021.9383569

Delowar Hossain, Yoshinao Sato

引用次数: 0

Speaker-Independent Visual Speech Recognition with the Inception V3 Model 基于Inception V3模型的说话人独立视觉语音识别

2021 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2021-01-19 DOI: 10.1109/SLT48900.2021.9383540

Timothy Israel Santos, Andrew Abel, N. Wilson, Yan Xu

引用次数: 4

SLT 2021 Title Page SLT 2021标题页

2021 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2021-01-19 DOI: 10.1109/slt48900.2021.9383601

引用次数: 0