2016 IEEE Spoken Language Technology Workshop (SLT)最新文献_第4页

Towards a virtual personal assistant based on a user-defined portfolio of multi-domain vocal applications 迈向基于用户自定义的多域语音应用组合的虚拟个人助理

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846252

Tatiana Ekeinhor-Komi, J. Bouraoui, R. Laroche, F. Lefèvre

引用次数: 2

Code-switching detection using multilingual DNNS 使用多语言DNNS的代码切换检测

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846326

Emre Yilmaz, H. V. D. Heuvel, D. V. Leeuwen

{"title":"Code-switching detection using multilingual DNNS","authors":"Emre Yilmaz, H. V. D. Heuvel, D. V. Leeuwen","doi":"10.1109/SLT.2016.7846326","DOIUrl":"https://doi.org/10.1109/SLT.2016.7846326","url":null,"abstract":"Automatic speech recognition (ASR) of code-switching speech requires careful handling of unexpected language switches that may occur in a single utterance. In this paper, we investigate the feasibility of using multilingually trained deep neural networks (DNN) for the ASR of Frisian speech containing code-switches to Dutch with the aim of building a robust recognizer that can handle this phenomenon. For this purpose, we train several multilingual DNN models on Frisian and two closely related languages, namely English and Dutch, to compare the impact of single-step and two-step multilingual DNN training on the recognition and code-switching detection performance. We apply bilingual DNN retraining on both target languages by varying the amount of training data belonging to the higher-resourced target language (Dutch). The recognition results show that the multilingual DNN training scheme with an initial multilingual training step followed by bilingual retraining provides recognition performance comparable to an oracle baseline recognizer that can employ language-specific acoustic models. We further show that we can detect code-switches at the word level with an equal error rate of around 17% excluding the deletions due to ASR errors.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125114933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Dialog state tracking with attention-based sequence-to-sequence learning 对话状态跟踪与基于注意力的序列到序列学习

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846317

Takaaki Hori, Hai Wang, Chiori Hori, Shinji Watanabe, B. Harsham, Jonathan Le Roux, J. Hershey, Yusuke Koji, Yi Jing, Zhaocheng Zhu, T. Aikawa

引用次数: 27

Towards acoustic model unification across dialects 跨方言声学模型的统一

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846328

Mohamed G. Elfeky, M. Bastani, Xavier Velez, P. Moreno, Austin Waters

引用次数: 29

Blind speech segmentation using spectrogram image-based features and Mel cepstral coefficients 基于谱图图像特征和梅尔倒谱系数的盲语音分割

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846324

Adriana Stan, Cassia Valentini-Botinhao, B. Orza, M. Giurgiu

引用次数: 11

The NDSC transcription system for the 2016 multi-genre broadcast challenge 2016年多类型广播挑战赛的NDSC转录系统

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846276

Xukui Yang, Dan Qu, Wenlin Zhang, Weiqiang Zhang

引用次数: 6

Pre-filtered dynamic time warping for posteriorgram based keyword search 基于后置图的关键词搜索的预滤波动态时间翘曲

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846292

Gozde Cetinkaya, Batuhan Gündogdu, M. Saraçlar

引用次数: 3

A factor analysis model of sequences for language recognition 用于语言识别的序列因子分析模型

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846287

M. Omar

{"title":"A factor analysis model of sequences for language recognition","authors":"M. Omar","doi":"10.1109/SLT.2016.7846287","DOIUrl":"https://doi.org/10.1109/SLT.2016.7846287","url":null,"abstract":"Joint factor analysis [1] application to speaker and language recognition advanced the performance of automatic systems in these areas. A special case of the early work in [1], namely the i-vector representation [2], has been applied successfully in many areas including speaker [2], language [3], and speech recognition [4]. This work presents a novel model which represents a long sequence of observations using the factor analysis model of shorter overlapping subsquences. This model takes into consideration the dependency of the adjacent latent vectors. It is shown that this model outperforms the current joint factor analysis approach based on the assumption of independent and identically distributed (iid) observations given one global latent vector. In addition, we replace the language-independent prior model of the latent vector in the i-vector model with a language-dependent prior model and modify the objective function used in the estimation of the factor analysis projection matrix and the prior model to correspond to the cross-entropy objective function estimated based on this new model. We derive also the update equations of the projection matrix and the prior model parameters which maximize the cross-entropy objective function. We evaluate the performance of our approach on the language recognition task of the robust automatic transcription of speech (RATS) project. Our experiments show improvements up to 11% relative using the proposed approach in terms of equal error rate compared to the standard approach of using an i-vector representation [2].","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114213755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Development of the MIT ASR system for the 2016 Arabic Multi-genre Broadcast Challenge 为2016年阿拉伯多类型广播挑战赛开发麻省理工学院ASR系统

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846280

T. A. Hanai, Wei-Ning Hsu, James R. Glass

引用次数: 15

Syntax or semantics? knowledge-guided joint semantic frame parsing 语法还是语义?知识引导联合语义框架解析

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846288

Yun-Nung (Vivian) Chen, Dilek Hakanni-Tur, Gökhan Tür, Asli Celikyilmaz, Jianfeng Gao, L. Deng

{"title":"Syntax or semantics? knowledge-guided joint semantic frame parsing","authors":"Yun-Nung (Vivian) Chen, Dilek Hakanni-Tur, Gökhan Tür, Asli Celikyilmaz, Jianfeng Gao, L. Deng","doi":"10.1109/SLT.2016.7846288","DOIUrl":"https://doi.org/10.1109/SLT.2016.7846288","url":null,"abstract":"Spoken language understanding (SLU) is a core component of a spoken dialogue system, which involves intent prediction and slot filling and also called semantic frame parsing. Recently recurrent neural networks (RNN) obtained strong results on SLU due to their superior ability of preserving sequential information over time. Traditionally, the SLU component parses semantic frames for utterances considering their flat structures, as the underlying RNN structure is a linear chain. However, natural language exhibits linguistic properties that provide rich, structured information for better understanding. This paper proposes to apply knowledge-guided structural attention networks (K-SAN), which additionally incorporate non-flat network topologies guided by prior knowledge, to a language understanding task. The model can effectively figure out the salient substructures that are essential to parse the given utterance into its semantic frame with an attention mechanism, where two types of knowledge, syntax and semantics, are utilized. The experiments on the benchmark Air Travel Information System (ATIS) data and the conversational assistant Cortana data show that 1) the proposed K-SAN models with syntax or semantics outperform the state-of-the-art neural network based results, and 2) the improvement for joint semantic frame parsing is more significant, because the structured information provides rich cues for sentence-level understanding, where intent prediction and slot filling can be mutually improved.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"263 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124243016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 49