2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)最新文献_第4页

Robust Belief State Space Representation for Statistical Dialogue Managers Using Deep Autoencoders 基于深度自编码器的统计对话管理器的鲁棒信念状态空间表示

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI: 10.1109/ASRU46091.2019.9003871

Fotios Lygerakis, Vassilios Diakoloulas, M. Lagoudakis, M. Kotti

{"title":"Robust Belief State Space Representation for Statistical Dialogue Managers Using Deep Autoencoders","authors":"Fotios Lygerakis, Vassilios Diakoloulas, M. Lagoudakis, M. Kotti","doi":"10.1109/ASRU46091.2019.9003871","DOIUrl":"https://doi.org/10.1109/ASRU46091.2019.9003871","url":null,"abstract":"Statistical Dialogue Systems (SDS) have proved their humongous potential over the past few years. However, the lack of efficient and robust representations of the belief state (BS) space refrains them from revealing their full potential. There is a great need for automatic BS representations, which will replace the old hand-crafted, variable-length ones. To tackle those problems, we introduce a novel use of Autoencoders (AEs). Our goal is to obtain a low-dimensional, fixed-length, and compact, yet robust representation of the BS space. We investigate the use of dense AE, Denoising AE (DAE) and Variational Denoising AE (VDAE), which we combine with GP-SARSA to learn dialogue policies in the PyDial toolkit. In this framework, the BS is normally represented in a relatively compact, but still redundant summary space which is obtained through a heuristic mapping of the original master space. We show that all the proposed AE-based representations consistently outperform the summary BS representation. Especially, as the Semantic Error Rate (SER) increases, the DAE/VDAE-based representations obtain state-of-the-art and sample efficient performance.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132623589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Joint Optimization of Classification and Clustering for Deep Speaker Embedding 深度说话人嵌入的分类聚类联合优化

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI: 10.1109/ASRU46091.2019.9003860

Zhiming Wang, K. Yao, Shuo Fang, Xiaolong Li

引用次数: 5

On Temporal Context Information for Hybrid BLSTM-Based Phoneme Recognition 基于混合blstm的音位识别的时间上下文信息研究

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI: 10.1109/ASRU46091.2019.9003946

Timo Lohrenz, Maximilian Strake, T. Fingscheidt

引用次数: 3

Development of Voice Spoofing Detection Systems for 2019 Edition of Automatic Speaker Verification and Countermeasures Challenge 2019版自动说话者验证与对抗挑战赛语音欺骗检测系统的开发

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI: 10.1109/ASRU46091.2019.9003792

João Monteiro, Md. Jahangir Alam

{"title":"Development of Voice Spoofing Detection Systems for 2019 Edition of Automatic Speaker Verification and Countermeasures Challenge","authors":"João Monteiro, Md. Jahangir Alam","doi":"10.1109/ASRU46091.2019.9003792","DOIUrl":"https://doi.org/10.1109/ASRU46091.2019.9003792","url":null,"abstract":"A robust speaker verification system is expected to provide high recognition accuracy not only in adverse environments but also in the presence of spoofing attacks, which renders voice spoofing detection as crucial to prevent automatic speaker verification systems from a security breach. In this work, we present anti-spoofing systems developed for tackling spoofing attacks introduced for the ASVspoof 2019 challenge. We employ frame-level descriptors such as discrete Fourier transform, as well as constant Q transform-based spectral and cepstral features as countermeasures. These descriptors are both used on their own with a spoofing detection classifier to detect spoofing attacks, or in tandem with deep bottleneck features, i.e. approximate posteriors parametrized by a neural network designed to discriminate between bonafide and spoof signals. Fisher vector encoding and i-vector representations are further learned from the frame-level descriptors of the signals. For modeling, we employ two classification strategies. We finally build an end-to-end anti-spoofing system by making use of modified versions of light convolution neural networks as well as well-known ResNets. Our primary system for the logical access task and a single end-to-end system for the case of physical access we attain significant improvements over two baseline systems.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115152838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Improved Multi-Stage Training of Online Attention-Based Encoder-Decoder Models 基于注意力的在线编码器-解码器模型的改进多阶段训练

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI: 10.1109/ASRU46091.2019.9003936

Abhinav Garg, Dhananjaya N. Gowda, Ankur Kumar, Kwangyoun Kim, Mehul Kumar, Chanwoo Kim

引用次数: 13

WaveNet Factorization with Singular Value Decomposition for Voice Conversion 基于奇异值分解的波网分解语音转换

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI: 10.1109/ASRU46091.2019.9003801

Hongqiang Du, Xiaohai Tian, Lei Xie, Haizhou Li

引用次数: 4

Generalized Large-Context Language Models Based on Forward-Backward Hierarchical Recurrent Encoder-Decoder Models 基于前向向后分层循环编码器-解码器模型的广义大上下文语言模型

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI: 10.1109/ASRU46091.2019.9003857

Ryo Masumura, Mana Ihori, Tomohiro Tanaka, Itsumi Saito, Kyosuke Nishida, T. Oba

{"title":"Generalized Large-Context Language Models Based on Forward-Backward Hierarchical Recurrent Encoder-Decoder Models","authors":"Ryo Masumura, Mana Ihori, Tomohiro Tanaka, Itsumi Saito, Kyosuke Nishida, T. Oba","doi":"10.1109/ASRU46091.2019.9003857","DOIUrl":"https://doi.org/10.1109/ASRU46091.2019.9003857","url":null,"abstract":"This paper presents a generalized form of large-context language models (LCLMs) that can take linguistic contexts beyond utterance boundaries into consideration. In discourse-level and conversation-level automatic speech recognition (ASR) tasks, which have to handle a series of utterances, it is essential to capture long-range linguistic contexts beyond utterance boundaries. The LCLMs of previous studies mainly focused on utilizing past contexts, and none fully utilized future contexts because LMs typically process words in a time-ordered manner. Our key idea is to introduce the LCLMs into the situation where ASR results of the whole series of utterances are given by a first decoding pass. This situation makes it possible for the LCLMs to leverage future contexts. In this paper, we propose generalized LCLMs (GLCLMs) based on forward-backward hierarchical recurrent encoder-decoder models in which generative probabilities of individual utterances are computed by leveraging not only past contexts but also future contexts beyond utterance boundaries. In order to efficiently introduce GLCLMs to ASR, we also propose a global-context iterative rescoring method that repeatedly rescores the ASR hypotheses of an individual utterance by using surrounding ASR hypotheses. Experiments on discourse-level ASR tasks demonstrate the effectiveness of our GLCLM approach.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128886236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Copyright 版权

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI: 10.1109/asru46091.2019.9003875

引用次数: 0

Investigation of Shallow Wavenet Vocoder with Laplacian Distribution Output 拉普拉斯分布输出的浅波声码器研究

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI: 10.1109/ASRU46091.2019.9003800

Patrick Lumban Tobing, Tomoki Hayashi, T. Toda

引用次数: 1

Data Augmentation Based on Vowel Stretch for Improving Children's Speech Recognition 基于元音拉伸的数据增强提高儿童语音识别

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI: 10.1109/ASRU46091.2019.9003741

Tohru Nagano, Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata

{"title":"Data Augmentation Based on Vowel Stretch for Improving Children's Speech Recognition","authors":"Tohru Nagano, Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata","doi":"10.1109/ASRU46091.2019.9003741","DOIUrl":"https://doi.org/10.1109/ASRU46091.2019.9003741","url":null,"abstract":"Prolongation is a speech disfluency that lengthens some portions of speech utterances. It is frequently observed in children's spontaneous speech, while it is rare in read speech. To make acoustic models more robust to children's spontaneous speech, collecting a large amount of children's speech data containing prolongation is usually required, which is very impractical in many cases. To tackle this problem, we propose a novel data augmentation method that virtually generates additional data by simulating prolongation. The method inserts pseudo frames into specific positions of speech utterances to simulate prolongation. The acoustic features of the inserted frames are calculated from the original frames on both sides. This is based on our analysis that many of vowels are actually stretched in children's spontaneous speech. Our proposed procedure can generate partially stretched utterances with low computational costs, unlike a conventional speed or tempo perturbation method that extends and shrinks entire utterances at a uniform rate. The effectiveness of the proposed method were confirmed with the experiments of acoustic model adaptations, in which our proposed method focusing on vowel stretch showed consistent improvement compared with conventional speed and tempo perturbation approach.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128339421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10