2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)最新文献_第3页

Leveraging Language ID in Multilingual End-to-End Speech Recognition 在多语言端到端语音识别中利用语言ID

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI: 10.1109/ASRU46091.2019.9003870

Austin Waters, Neeraj Gaur, Parisa Haghani, P. Moreno, Zhongdi Qu

引用次数: 23

Speech Reveals Future Risk of Developing Dementia: Predictive Dementia Screening from Biographic Interviews 言语揭示未来患痴呆症的风险:从传记访谈中预测痴呆症筛查

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI: 10.1109/ASRU46091.2019.9003908

Jochen Weiner, C. Frankenberg, J. Schröder, Tanja Schultz

引用次数: 13

Long Range Acoustic and Deep Features Perspective on ASVspoof 2019 ASVspoof 2019的远程声学和深度特征透视

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI: 10.1109/ASRU46091.2019.9003845

Rohan Kumar Das, Jichen Yang, Haizhou Li

引用次数: 51

Personalization of End-to-End Speech Recognition on Mobile Devices for Named Entities 移动设备上命名实体端到端语音识别的个性化

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI: 10.1109/ASRU46091.2019.9003775

K. Sim, F. Beaufays, Arnaud Benard, Dhruv Guliani, Andreas Kabel, Nikhil Khare, Tamar Lucassen, P. Zadražil, Harry Zhang, Leif T. Johnson, Giovanni Motta, Lillian Zhou

{"title":"Personalization of End-to-End Speech Recognition on Mobile Devices for Named Entities","authors":"K. Sim, F. Beaufays, Arnaud Benard, Dhruv Guliani, Andreas Kabel, Nikhil Khare, Tamar Lucassen, P. Zadražil, Harry Zhang, Leif T. Johnson, Giovanni Motta, Lillian Zhou","doi":"10.1109/ASRU46091.2019.9003775","DOIUrl":"https://doi.org/10.1109/ASRU46091.2019.9003775","url":null,"abstract":"We study the effectiveness of several techniques to personalize end-to-end speech models and improve the recognition of proper names relevant to the user. These techniques differ in the amounts of user effort required to provide supervision, and are evaluated on how they impact speech recognition performance. We propose using keyword-dependent precision and recall metrics to measure vocabulary acquisition performance. We evaluate the algorithms on a dataset that we designed to contain names of persons that are difficult to recognize. Therefore, the baseline recall rate for proper names in this dataset is very low: 2.4%. A data synthesis approach we developed brings it to 48.6%, with no need for speech input from the user. With speech input, if the user corrects only the names, the name recall rate improves to 64.4%. If the user corrects all the recognition errors, we achieve the best recall of 73.5%. To eliminate the need to upload user data and store personalized models on a server, we focus on performing the entire personalization workflow on a mobile device.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"3 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114113061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 53

Highly Efficient Neural Network Language Model Compression Using Soft Binarization Training 基于软二值化训练的高效神经网络语言模型压缩

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI: 10.1109/ASRU46091.2019.9003744

Rao Ma, Qi Liu, Kai Yu

引用次数: 5

Exploring Model Units and Training Strategies for End-to-End Speech Recognition 探索端到端语音识别的模型单元和训练策略

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI: 10.1109/ASRU46091.2019.9003834

Mingkun Huang, Yizhou Lu, Lan Wang, Y. Qian, Kai Yu

引用次数: 8

On the Study of Generative Adversarial Networks for Cross-Lingual Voice Conversion 基于生成对抗网络的跨语言语音转换研究

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI: 10.1109/ASRU46091.2019.9003939

Berrak Sisman, Mingyang Zhang, M. Dong, Haizhou Li

{"title":"On the Study of Generative Adversarial Networks for Cross-Lingual Voice Conversion","authors":"Berrak Sisman, Mingyang Zhang, M. Dong, Haizhou Li","doi":"10.1109/ASRU46091.2019.9003939","DOIUrl":"https://doi.org/10.1109/ASRU46091.2019.9003939","url":null,"abstract":"Cross-lingual voice conversion (VC) aims to convert the source speaker's voice to sound like that of the target speaker, when the source and target speakers speak different languages. In this paper, we propose to use Generative Adversarial Networks (GANs) for cross-lingual voice-conversion. We further the studies on Variational Autoencoding Wasserstein GAN (VAW-GAN) and cycle-consistent adversarial network (CycleGAN), that are known to be effective for mono-lingual voice conversion. As cross-lingual voice conversion needs to converts the voice across different phonetic system, it is more challenging than mono-lingual voice conversion. By using VAW-GAN and CycleGAN, we successfully convert the speaker identity while carrying over the source speaker's linguistic content. The proposed idea is unique in the sense that it neither relies on bilingual data and their alignment, nor any external process, such as ASR. Moreover, it works with limited amount of training data of any two languages. To our best knowledge, this is the first comprehensive study of Generative Adversarial Networks in cross-lingual voice conversion. In the experiments, we achieve high-quality converted voice, that performs equally well or better than mono-lingual voice conversion.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123242890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Mixed Bandwidth Acoustic Modeling Leveraging Knowledge Distillation 基于知识蒸馏的混合带宽声学建模

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI: 10.1109/ASRU46091.2019.9003760

Takashi Fukuda, Samuel Thomas

引用次数: 4

Small-Footprint Keyword Spotting with Graph Convolutional Network 基于图卷积网络的小足迹关键词识别

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI: 10.1109/ASRU46091.2019.9004005

Xi Chen, S. Yin, Dandan Song, P. Ouyang, Leibo Liu, Shaojun Wei

引用次数: 18

Incremental Lattice Determinization for WFST Decoders WFST解码器的增量点阵确定

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI: 10.1109/ASRU46091.2019.9004006

Zhehuai Chen, M. Yarmohammadi, Hainan Xu, Hang Lv, Lei Xie, Daniel Povey, S. Khudanpur

引用次数: 0