2018 IEEE Spoken Language Technology Workshop (SLT)最新文献

Far-Field ASR Using Low-Rank and Sparse Soft Targets from Parallel Data 基于并行数据的低秩稀疏软目标远场ASR

2018 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2018-12-01 DOI: 10.1109/SLT.2018.8639579

Pranay Dighe, Afsaneh Asaei, H. Bourlard

{"title":"Far-Field ASR Using Low-Rank and Sparse Soft Targets from Parallel Data","authors":"Pranay Dighe, Afsaneh Asaei, H. Bourlard","doi":"10.1109/SLT.2018.8639579","DOIUrl":"https://doi.org/10.1109/SLT.2018.8639579","url":null,"abstract":"Far-field automatic speech recognition (ASR) of conversational speech is often considered to be a very challenging task due to the poor quality of alignments available for training the DNN acoustic models. A common way to alleviate this problem is to use clean alignments obtained from parallelly recorded close-talk speech data. In this work, we advance the parallel data approach by obtaining enhanced low-rank and sparse soft targets from a close-talk ASR system and using them for training more accurate far-field acoustic models. Specifically, we (i) exploit eigenposteriors and Compressive Sensing dictionaries to learn low-dimensional senone subspaces in DNN posterior space, and (ii) enhance close-talk DNN posteriors to achieve high quality soft targets for training far-field DNN acoustic models. We show that the enhanced soft targets encode the structural and temporal interrelationships among senone classes which are easily accessible in the DNN posterior space of close-talk speech but not in its noisy far-field counterpart. We exploit enhanced soft targets to improve the mapping of far-field acoustics to close-talk senone classes. The experiments are performed on AMI meeting corpus where our approach improves DNN based acoustic modeling by 4.4% absolute (~8% rel.) reduction in WER as compared to a system which doesn’t use parallel data. Finally, the approach is also validated on state-of-the-art recurrent and time delay neural network architectures.","PeriodicalId":377307,"journal":{"name":"2018 IEEE Spoken Language Technology Workshop (SLT)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115392007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Comparing Prosodic Frameworks: Investigating the Acoustic-Symbolic Relationship in ToBI and RaP 比较韵律框架:ToBI和RaP的声-符号关系研究

2018 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2018-12-01 DOI: 10.1109/SLT.2018.8639539

Raul Fernandez, A. Rosenberg

引用次数: 0

Word Segmentation From Phoneme Sequences Based On Pitman-Yor Semi-Markov Model Exploiting Subword Information 基于利用子词信息的Pitman-Yor半马尔可夫模型的音素分词

2018 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2018-12-01 DOI: 10.1109/SLT.2018.8639607

Ryu Takeda, Kazunori Komatani, Alexander I. Rudnicky

引用次数: 1

Improving FFTNet Vocoder with Noise Shaping and Subband Approaches 用噪声整形和子带方法改进FFTNet声码器

2018 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2018-12-01 DOI: 10.1109/SLT.2018.8639687

T. Okamoto, T. Toda, Y. Shiga, H. Kawai

引用次数: 14

JSpeech: A Multi-Lingual Conversational Speech Corpus JSpeech:一个多语言会话语音语料库

2018 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2018-12-01 DOI: 10.1109/SLT.2018.8639658

A. J. Choobbasti, Mohammad Erfan Gholamian, Amir Vaheb, Saeid Safavi

引用次数: 1

Detection and Calibration of Whisper for Speaker Recognition 基于说话人识别的耳语检测与校准

2018 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2018-12-01 DOI: 10.1109/SLT.2018.8639595

Finnian Kelly, J. Hansen

引用次数: 4

Phase-Based Feature Representations for Improving Recognition of Dysarthric Speech 基于相位的特征表示提高困难语音识别

2018 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2018-12-01 DOI: 10.1109/SLT.2018.8639031

S. Sehgal, S. Cunningham, P. Green

引用次数: 4

Adaptive Wavenet Vocoder for Residual Compensation in GAN-Based Voice Conversion 基于gan的语音转换中残差补偿的自适应小波声码器

2018 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2018-12-01 DOI: 10.1109/SLT.2018.8639507

Berrak Sisman, Mingyang Zhang, S. Sakti, Haizhou Li, Satoshi Nakamura

引用次数: 37

Role Annotated Speech Recognition for Conversational Interactions 用于会话交互的角色注释语音识别

2018 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2018-12-01 DOI: 10.1109/SLT.2018.8639611

Nikolaos Flemotomos, Zhuohao Chen, David C. Atkins, Shrikanth S. Narayanan

引用次数: 2

Context-Aware Attention Mechanism for Speech Emotion Recognition 语音情绪识别的语境感知注意机制

2018 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2018-12-01 DOI: 10.1109/SLT.2018.8639633

Gaetan Ramet, Philip N. Garner, Michael Baeriswyl, Alexandros Lazaridis

引用次数: 37