2016 IEEE Spoken Language Technology Workshop (SLT)最新文献_第3页

BBN technologies' OpenSAD system BBN technologies的OpenSAD系统

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846238

Scott Novotney, D. Karakos, J. Silovský, R. Schwartz

引用次数: 2

Automated structure discovery and parameter tuning of neural network language model based on evolution strategy 基于进化策略的神经网络语言模型自动结构发现与参数整定

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846334

Tomohiro Tanaka, Takafumi Moriya, T. Shinozaki, Shinji Watanabe, Takaaki Hori, Kevin Duh

引用次数: 16

Automated optimization of decoder hyper-parameters for online LVCSR 在线LVCSR解码器超参数的自动优化

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846303

Akshay Chandrashekaran, Ian Lane

{"title":"Automated optimization of decoder hyper-parameters for online LVCSR","authors":"Akshay Chandrashekaran, Ian Lane","doi":"10.1109/SLT.2016.7846303","DOIUrl":"https://doi.org/10.1109/SLT.2016.7846303","url":null,"abstract":"In this paper, we explore the usage of automated hyper-parameter optimization techniques with scalarization of multiple objectives to find decoder hyper-parameters suitable for a given acoustic and language model for an LVCSR task. We compare manual optimization, random sampling, tree of Parzen estimators, Bayesian Optimization, and genetic algorithm to find a technique that yields better performance than manual optimization in a comparable number of hyper-parameter evaluations. We focus on a scalar combination of word error rate (WER), log of real time factor (logRTF), and peak memory usage, formulated using the augmented Tchebyscheff function(ATF), as the objective function for the automated techniques. For this task, with a constraint on the maximum number of objective evaluations, we find that the best automated optimization technique: Bayesian Optimization outperforms manual optimization by 8% in terms of ATF. We find that memory usage was not a very useful distinguishing factor between different hyper-parameter settings, with trade-offs occurring between RTF and WER a majority of the time. We also try to perform optimization of WER with a hard constraint on the real time factor of 0.1. In this case, performing constrained Bayesian Optimization yields a model that provides an improvement of 2.7% over the best model obtained from manual optimization with 60% the number of evaluations.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125964616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Parallel Long Short-Term Memory for multi-stream classification 多流分类的并行长短期记忆

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846268

Mohamed Bouaziz, Mohamed Morchid, Richard Dufour, G. Linarès, R. Mori

{"title":"Parallel Long Short-Term Memory for multi-stream classification","authors":"Mohamed Bouaziz, Mohamed Morchid, Richard Dufour, G. Linarès, R. Mori","doi":"10.1109/SLT.2016.7846268","DOIUrl":"https://doi.org/10.1109/SLT.2016.7846268","url":null,"abstract":"Recently, machine learning methods have provided a broad spectrum of original and efficient algorithms based on Deep Neural Networks (DNN) to automatically predict an outcome with respect to a sequence of inputs. Recurrent hidden cells allow these DNN-based models to manage long-term dependencies such as Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM). Nevertheless, these RNNs process a single input stream in one (LSTM) or two (Bidirectional LSTM) directions. But most of the information available nowadays is from multistreams or multimedia documents, and require RNNs to process these information synchronously during the training. This paper presents an original LSTM-based architecture, named Parallel LSTM (PLSTM), that carries out multiple parallel synchronized input sequences in order to predict a common output. The proposed PLSTM method could be used for parallel sequence classification purposes. The PLSTM approach is evaluated on an automatic telecast genre sequences classification task and compared with different state-of-the-art architectures. Results show that the proposed PLSTM method outperforms the baseline n-gram models as well as the state-of-the-art LSTM approach.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126700902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Influence of corpus size and content on the perceptual quality of a unit selection MaryTTS voice 语料库大小和内容对单元选择MaryTTS语音感知质量的影响

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846336

Florian Hinterleitner, Benjamin Weiss, S. Möller

引用次数: 2

Automatic plagiarism detection for spoken responses in an assessment of English language proficiency 英语语言能力评估中口语回答的自动抄袭检测

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846254

Xinhao Wang, Keelan Evanini, James V. Bruno, Matthew David Mulholland

引用次数: 8

Robust utterance classification using multiple classifiers in the presence of speech recognition errors 基于多分类器的语音识别错误鲁棒性语音分类

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846291

Takeshi Homma, Kazuaki Shima, Takuya Matsumoto

引用次数: 2

Automatic turn segmentation for Movie & TV subtitles 自动转向分割电影和电视字幕

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846272

Pierre Lison, R. Meena

引用次数: 23

End-to-End attention based text-dependent speaker verification 基于端到端注意的文本依赖说话人验证

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846261

Shi-Xiong Zhang, Zhuo Chen, Yong Zhao, Jinyu Li, Y. Gong

{"title":"End-to-End attention based text-dependent speaker verification","authors":"Shi-Xiong Zhang, Zhuo Chen, Yong Zhao, Jinyu Li, Y. Gong","doi":"10.1109/SLT.2016.7846261","DOIUrl":"https://doi.org/10.1109/SLT.2016.7846261","url":null,"abstract":"A new type of End-to-End system for text-dependent speaker verification is presented in this paper. Previously, using the phonetic discriminate/speaker discriminate DNN as a feature extractor for speaker verification has shown promising results. The extracted frame-level (bottleneck, posterior or d-vector) features are equally weighted and aggregated to compute an utterance-level speaker representation (d-vector or i-vector). In this work we use a speaker discriminate CNN to extract the noise-robust frame-level features. These features are smartly combined to form an utterance-level speaker vector through an attention mechanism. The proposed attention model takes the speaker discriminate information and the phonetic information to learn the weights. The whole system, including the CNN and attention model, is joint optimized using an end-to-end criterion. The training algorithm imitates exactly the evaluation process — directly mapping a test utterance and a few target speaker utterances into a single verification score. The algorithm can smartly select the most similar impostor for each target speaker to train the network. We demonstrated the effectiveness of the proposed end-to-end system on Windows 10 “Hey Cortana” speaker verification task.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114651716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 173

Attribute based shared hidden layers for cross-language knowledge transfer 基于属性的跨语言知识传递共享隐藏层

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846327

Vipul Arora, A. Lahiri, Henning Reetz

引用次数: 2