2016 IEEE Spoken Language Technology Workshop (SLT)最新文献

筛选
英文 中文
Towards a virtual personal assistant based on a user-defined portfolio of multi-domain vocal applications 迈向基于用户自定义的多域语音应用组合的虚拟个人助理
2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846252
Tatiana Ekeinhor-Komi, J. Bouraoui, R. Laroche, F. Lefèvre
{"title":"Towards a virtual personal assistant based on a user-defined portfolio of multi-domain vocal applications","authors":"Tatiana Ekeinhor-Komi, J. Bouraoui, R. Laroche, F. Lefèvre","doi":"10.1109/SLT.2016.7846252","DOIUrl":"https://doi.org/10.1109/SLT.2016.7846252","url":null,"abstract":"This paper proposes a novel approach to defining and simulating a new generation of virtual personal assistants as multi-application multi-domain distributed dialogue systems. The first contribution is the assistant architecture, composed of independent third-party applications handled by a Dispatcher. In this view, applications are black-boxes responding with a self-scored answer to user requests. Next, the Dispatcher distributes the current request to the most relevant application, based on these scores and the context (history of interaction etc.), and conveys its answer to the user. To address variations in the user-defined portfolio of applications, the second contribution, a stochastic model automates the online optimisation of the Dispatcher's behaviour. To evaluate the learnability of the Dispatcher's policy, several parametrisations of the user and application simulators are enabled, in such a way that they cover variations of realistic situations. Results confirm in all considered configurations of interest, that reinforcement learning can learn adapted strategies.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127354192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Code-switching detection using multilingual DNNS 使用多语言DNNS的代码切换检测
2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846326
Emre Yilmaz, H. V. D. Heuvel, D. V. Leeuwen
{"title":"Code-switching detection using multilingual DNNS","authors":"Emre Yilmaz, H. V. D. Heuvel, D. V. Leeuwen","doi":"10.1109/SLT.2016.7846326","DOIUrl":"https://doi.org/10.1109/SLT.2016.7846326","url":null,"abstract":"Automatic speech recognition (ASR) of code-switching speech requires careful handling of unexpected language switches that may occur in a single utterance. In this paper, we investigate the feasibility of using multilingually trained deep neural networks (DNN) for the ASR of Frisian speech containing code-switches to Dutch with the aim of building a robust recognizer that can handle this phenomenon. For this purpose, we train several multilingual DNN models on Frisian and two closely related languages, namely English and Dutch, to compare the impact of single-step and two-step multilingual DNN training on the recognition and code-switching detection performance. We apply bilingual DNN retraining on both target languages by varying the amount of training data belonging to the higher-resourced target language (Dutch). The recognition results show that the multilingual DNN training scheme with an initial multilingual training step followed by bilingual retraining provides recognition performance comparable to an oracle baseline recognizer that can employ language-specific acoustic models. We further show that we can detect code-switches at the word level with an equal error rate of around 17% excluding the deletions due to ASR errors.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125114933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Dialog state tracking with attention-based sequence-to-sequence learning 对话状态跟踪与基于注意力的序列到序列学习
2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846317
Takaaki Hori, Hai Wang, Chiori Hori, Shinji Watanabe, B. Harsham, Jonathan Le Roux, J. Hershey, Yusuke Koji, Yi Jing, Zhaocheng Zhu, T. Aikawa
{"title":"Dialog state tracking with attention-based sequence-to-sequence learning","authors":"Takaaki Hori, Hai Wang, Chiori Hori, Shinji Watanabe, B. Harsham, Jonathan Le Roux, J. Hershey, Yusuke Koji, Yi Jing, Zhaocheng Zhu, T. Aikawa","doi":"10.1109/SLT.2016.7846317","DOIUrl":"https://doi.org/10.1109/SLT.2016.7846317","url":null,"abstract":"We present an advanced dialog state tracking system designed for the 5th Dialog State Tracking Challenge (DSTC5). The main task of DSTC5 is to track the dialog state in a human-human dialog. For each utterance, the tracker emits a frame of slot-value pairs considering the full history of the dialog up to the current turn. Our system includes an encoder-decoder architecture with an attention mechanism to map an input word sequence to a set of semantic labels, i.e., slot-value pairs. This handles the problem of the unknown alignment between the utterances and the labels. By combining the attention-based tracker with rule-based trackers elaborated for English and Chinese, the F-score for the development set improved from 0.475 to 0.507 compared to the rule-only trackers. Moreover, we achieved 0.517 F-score by refining the combination strategy based on the topic and slot level performance of each tracker. In this paper, we also validate the efficacy of each technique and report the test set results submitted to the challenge.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130842951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Towards acoustic model unification across dialects 跨方言声学模型的统一
2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846328
Mohamed G. Elfeky, M. Bastani, Xavier Velez, P. Moreno, Austin Waters
{"title":"Towards acoustic model unification across dialects","authors":"Mohamed G. Elfeky, M. Bastani, Xavier Velez, P. Moreno, Austin Waters","doi":"10.1109/SLT.2016.7846328","DOIUrl":"https://doi.org/10.1109/SLT.2016.7846328","url":null,"abstract":"Acoustic model performance typically decreases when evaluated on a dialectal variation of the same language that was not used during training. Similarly, models simultaneously trained on a group of dialects tend to underperform dialect-specific models. In this paper, we report on our efforts towards building a unified acoustic model that can serve a multi-dialectal language. Two techniques are presented: Distillation and MultiTask Learning (MTL). In Distillation, we use an ensemble of dialect-specific acoustic models and distill its knowledge in a single model. In MTL, we utilize multitask learning to train a unified acoustic model that learns to distinguish dialects as a side task. We show that both techniques are superior to the jointly-trained model that is trained on all dialectal data, reducing word error rates by 4:2% and 0:6%, respectively. While achieving this improvement, neither technique degrades the performance of the dialect-specific models by more than 3:4%.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128415902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Blind speech segmentation using spectrogram image-based features and Mel cepstral coefficients 基于谱图图像特征和梅尔倒谱系数的盲语音分割
2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846324
Adriana Stan, Cassia Valentini-Botinhao, B. Orza, M. Giurgiu
{"title":"Blind speech segmentation using spectrogram image-based features and Mel cepstral coefficients","authors":"Adriana Stan, Cassia Valentini-Botinhao, B. Orza, M. Giurgiu","doi":"10.1109/SLT.2016.7846324","DOIUrl":"https://doi.org/10.1109/SLT.2016.7846324","url":null,"abstract":"This paper introduces a novel method for blind speech segmentation at a phone level based on image processing. We consider the spectrogram of the waveform of an utterance as an image and hypothesize that its striping defects, i.e. discontinuities, appear due to phone boundaries. Using a simple image destriping algorithm these discontinuities are found. To discover phone transitions which are not as salient in the image, we compute spectral changes derived from the time evolution of Mel cepstral parametrisation of speech. These socalled image-based and acoustic features are then combined to form a mixed probability function, whose values indicate the likelihood of a phone boundary being located at the corresponding time frame. The method is completely unsupervised and achieves an accuracy of 75.59% at a −3.26% over-segmentation rate, yielding an F-measure of 0.76 and an 0.80 R-value on the TIMIT dataset.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128662374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
The NDSC transcription system for the 2016 multi-genre broadcast challenge 2016年多类型广播挑战赛的NDSC转录系统
2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846276
Xukui Yang, Dan Qu, Wenlin Zhang, Weiqiang Zhang
{"title":"The NDSC transcription system for the 2016 multi-genre broadcast challenge","authors":"Xukui Yang, Dan Qu, Wenlin Zhang, Weiqiang Zhang","doi":"10.1109/SLT.2016.7846276","DOIUrl":"https://doi.org/10.1109/SLT.2016.7846276","url":null,"abstract":"The National Digital Switching System Engineering and Technological R&D Center (NDSC) speech-to-text transcription system for the 2016 multi-genre broadcast challenge is described. Various acoustic models based on deep neural network (DNN), such as hybrid DNN, long short term memory recurrent neural network (LSTM RNN), and time delay neural network (TDNN), are trained. The system also makes use of recurrent neural network language models (RNNLMs) for re-scoring and minimum Bayes risk (MBR) combination. The WER on test dataset of the speech-to-text task is 18.2%. Furthermore, to simulate real applications where manual segmentations were not available an automatic segmentation system based on long-term information is proposed. WERs based on the automatically generated segments were slightly worse than that based on the manual segmentations.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116588429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Pre-filtered dynamic time warping for posteriorgram based keyword search 基于后置图的关键词搜索的预滤波动态时间翘曲
2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846292
Gozde Cetinkaya, Batuhan Gündogdu, M. Saraçlar
{"title":"Pre-filtered dynamic time warping for posteriorgram based keyword search","authors":"Gozde Cetinkaya, Batuhan Gündogdu, M. Saraçlar","doi":"10.1109/SLT.2016.7846292","DOIUrl":"https://doi.org/10.1109/SLT.2016.7846292","url":null,"abstract":"In this study, we present a pre-filtering method for dynamic time warping (DTW) to improve the efficiency of a posteriorgram based keyword search (KWS) system. The ultimate aim is to improve the performance of a large vocabulary continuous speech recognition (LVCSR) based KWS system using the posteriorgram based KWS approach. We use phonetic posteriorgrams to represent the audio data and generate average posteriorgrams to represent the given text queries. The DTW algorithm is used to determine the optimal alignment between the posteriorgrams of the audio data and the queries. Since DTW has quadratic complexity, it can be relatively inefficient for keyword search. Our main contribution is to reduce this complexity by pre-filtering based on a vector space representation of the two posteriorgrams without any degradation in performance. Experimental results show that our system reduces the complexity and when combined with the baseline LVCSR based KWS system, it improves the performance both for the out-of-vocabulary (OOV) queries and the in-vocabulary (IV) queries.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"389 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121004692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A factor analysis model of sequences for language recognition 用于语言识别的序列因子分析模型
2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846287
M. Omar
{"title":"A factor analysis model of sequences for language recognition","authors":"M. Omar","doi":"10.1109/SLT.2016.7846287","DOIUrl":"https://doi.org/10.1109/SLT.2016.7846287","url":null,"abstract":"Joint factor analysis [1] application to speaker and language recognition advanced the performance of automatic systems in these areas. A special case of the early work in [1], namely the i-vector representation [2], has been applied successfully in many areas including speaker [2], language [3], and speech recognition [4]. This work presents a novel model which represents a long sequence of observations using the factor analysis model of shorter overlapping subsquences. This model takes into consideration the dependency of the adjacent latent vectors. It is shown that this model outperforms the current joint factor analysis approach based on the assumption of independent and identically distributed (iid) observations given one global latent vector. In addition, we replace the language-independent prior model of the latent vector in the i-vector model with a language-dependent prior model and modify the objective function used in the estimation of the factor analysis projection matrix and the prior model to correspond to the cross-entropy objective function estimated based on this new model. We derive also the update equations of the projection matrix and the prior model parameters which maximize the cross-entropy objective function. We evaluate the performance of our approach on the language recognition task of the robust automatic transcription of speech (RATS) project. Our experiments show improvements up to 11% relative using the proposed approach in terms of equal error rate compared to the standard approach of using an i-vector representation [2].","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114213755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Development of the MIT ASR system for the 2016 Arabic Multi-genre Broadcast Challenge 为2016年阿拉伯多类型广播挑战赛开发麻省理工学院ASR系统
2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846280
T. A. Hanai, Wei-Ning Hsu, James R. Glass
{"title":"Development of the MIT ASR system for the 2016 Arabic Multi-genre Broadcast Challenge","authors":"T. A. Hanai, Wei-Ning Hsu, James R. Glass","doi":"10.1109/SLT.2016.7846280","DOIUrl":"https://doi.org/10.1109/SLT.2016.7846280","url":null,"abstract":"The Arabic language, with over 300 million speakers, has significant diversity and breadth. This proves challenging when building an automated system to understand what is said. This paper describes an Arabic Automatic Speech Recognition system developed on a 1,200 hour speech corpus that was made available for the 2016 Arabic Multi-genre Broadcast (MGB) Challenge. A range of Deep Neural Network (DNN) topologies were modeled including; Feed-forward, Convolutional, Time-Delay, Recurrent Long Short-Term Memory (LSTM), Highway LSTM (H-LSTM), and Grid LSTM (GLSTM). The best performance came from a sequence discriminatively trained G-LSTM neural network. The best overall Word Error Rate (WER) was 18.3% (p < 0:001) on the development set, after combining hypotheses of 3 and 5 layer sequence discriminatively trained G-LSTM models that had been rescored with a 4-gram language model.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116027195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Syntax or semantics? knowledge-guided joint semantic frame parsing 语法还是语义?知识引导联合语义框架解析
2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846288
Yun-Nung (Vivian) Chen, Dilek Hakanni-Tur, Gökhan Tür, Asli Celikyilmaz, Jianfeng Gao, L. Deng
{"title":"Syntax or semantics? knowledge-guided joint semantic frame parsing","authors":"Yun-Nung (Vivian) Chen, Dilek Hakanni-Tur, Gökhan Tür, Asli Celikyilmaz, Jianfeng Gao, L. Deng","doi":"10.1109/SLT.2016.7846288","DOIUrl":"https://doi.org/10.1109/SLT.2016.7846288","url":null,"abstract":"Spoken language understanding (SLU) is a core component of a spoken dialogue system, which involves intent prediction and slot filling and also called semantic frame parsing. Recently recurrent neural networks (RNN) obtained strong results on SLU due to their superior ability of preserving sequential information over time. Traditionally, the SLU component parses semantic frames for utterances considering their flat structures, as the underlying RNN structure is a linear chain. However, natural language exhibits linguistic properties that provide rich, structured information for better understanding. This paper proposes to apply knowledge-guided structural attention networks (K-SAN), which additionally incorporate non-flat network topologies guided by prior knowledge, to a language understanding task. The model can effectively figure out the salient substructures that are essential to parse the given utterance into its semantic frame with an attention mechanism, where two types of knowledge, syntax and semantics, are utilized. The experiments on the benchmark Air Travel Information System (ATIS) data and the conversational assistant Cortana data show that 1) the proposed K-SAN models with syntax or semantics outperform the state-of-the-art neural network based results, and 2) the improvement for joint semantic frame parsing is more significant, because the structured information provides rich cues for sentence-level understanding, where intent prediction and slot filling can be mutually improved.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"263 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124243016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信