2016 IEEE Spoken Language Technology Workshop (SLT)最新文献

筛选
英文 中文
Multimodal deep neural nets for detecting humor in TV sitcoms 电视情景喜剧幽默检测的多模态深度神经网络
2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846293
D. Bertero, Pascale Fung
{"title":"Multimodal deep neural nets for detecting humor in TV sitcoms","authors":"D. Bertero, Pascale Fung","doi":"10.1109/SLT.2016.7846293","DOIUrl":"https://doi.org/10.1109/SLT.2016.7846293","url":null,"abstract":"We propose a novel approach of combining acoustic and language features to predict humor in dialogues with a deep neural network. We analyze data from three popular TV-sitcoms whose canned laughters give an indication of when the audience would react. We model the setup-punchline sequential relation of conversational humor with a Long Short-Term Memory network, with utterance encodings obtained from two Convolutional Neural Networks, one to model word-level language features and the other to model frame-level acoustic and prosodic features. Our neural network framework is able to improve the F-score of over 5% over a Conditional Random Field baseline trained on a similar acoustic and language feature combination, achieving a much higher recall. It is also more effective over a language features-only setting, with a F-score of 10% higher. It also has a good generalization performance, reaching in most cases precision values of over 70% when trained and tested over different sitcoms.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121608272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
I-Vector estimation as auxiliary task for Multi-Task Learning based acoustic modeling for automatic speech recognition 基于多任务学习声学建模的语音自动识别辅助任务i-向量估计
2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846237
Gueorgui Pironkov, S. Dupont, T. Dutoit
{"title":"I-Vector estimation as auxiliary task for Multi-Task Learning based acoustic modeling for automatic speech recognition","authors":"Gueorgui Pironkov, S. Dupont, T. Dutoit","doi":"10.1109/SLT.2016.7846237","DOIUrl":"https://doi.org/10.1109/SLT.2016.7846237","url":null,"abstract":"I-Vectors have been successfully applied in the speaker identification community in order to characterize the speaker and its acoustic environment. Recently, i-vectors have also shown their usefulness in automatic speech recognition, when concatenated to standard acoustic features. Instead of directly feeding the acoustic model with i-vectors, we here investigate a Multi-Task Learning approach, where a neural network is trained to simultaneously recognize the phone-state posterior probabilities and extract i-vectors, using the standard acoustic features. Multi-Task Learning is a regularization method which aims at improving the network's generalization ability, by training a unique network to solve several different, but related tasks. The core idea of using i-vector extraction as an auxiliary task is to give the network an additional inter-speaker awareness, and thus, reduce overfitting. Overfitting is a commonly met issue in speech recognition and is especially impacting when the amount of training data is limited. The proposed setup is trained and tested on the TIMIT database, while the acoustic modeling is performed using a Recurrent Neural Network with Long Short-Term Memory cells.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121665401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The MSIIP system for dialog state tracking challenge 5 MSIIP系统对对话状态跟踪的挑战5
2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846313
Ying Su, Miao Li, Ji Wu
{"title":"The MSIIP system for dialog state tracking challenge 5","authors":"Ying Su, Miao Li, Ji Wu","doi":"10.1109/SLT.2016.7846313","DOIUrl":"https://doi.org/10.1109/SLT.2016.7846313","url":null,"abstract":"We present our work in Dialog State Tracking Challenge 5, the main task of which is to track dialog state on human-human conversations cross language. Firstly a probabilistic enhanced framework is used to represent sub-dialog, which consists of three parts, the input model for extracting features, the enhanced model for updating dialog state and the output model to give the tracking frame. Meanwhile, parallel language systems are proposed to overcome inaccuracy caused by machine translation for cross language testing. We also introduce a new iterative alignment method extended from our work in DSTC4. Furthermore, a slot-based score averaging method is introduced to build an ensemble by combining different trackers. Results of our DSTC5 system show that our method significantly improves tracking performance compared with baseline method.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114369722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Neural dialog state tracker for large ontologies by attention mechanism 基于注意机制的大型本体神经对话状态跟踪
2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846314
Youngsoo Jang, Jiyeon Ham, Byung-Jun Lee, Youngjae Chang, Kee-Eung Kim
{"title":"Neural dialog state tracker for large ontologies by attention mechanism","authors":"Youngsoo Jang, Jiyeon Ham, Byung-Jun Lee, Youngjae Chang, Kee-Eung Kim","doi":"10.1109/SLT.2016.7846314","DOIUrl":"https://doi.org/10.1109/SLT.2016.7846314","url":null,"abstract":"This paper presents a dialog state tracker submitted to Dialog State Tracking Challenge 5 (DSTC 5) with details. To tackle the challenging cross-language human-human dialog state tracking task with limited training data, we propose a tracker that focuses on words with meaningful context based on attention mechanism and bi-directional long short term memory (LSTM). The vocabulary including a plenty of proper nouns is vectorized with a sufficient amount of related texts crawled from web to learn a good embedding for words not existent in training dialogs. Despite its simplicity, our proposed tracker succeeded to achieve high accuracy without sophisticated pre- and post-processing.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"181 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128178601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Performance monitoring for automatic speech recognition in noisy multi-channel environments 噪声多通道环境下自动语音识别的性能监测
2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846244
B. Meyer, Sri Harish Reddy Mallidi, Angel Mario Castro Martinez, G. P. Vayá, H. Kayser, H. Hermansky
{"title":"Performance monitoring for automatic speech recognition in noisy multi-channel environments","authors":"B. Meyer, Sri Harish Reddy Mallidi, Angel Mario Castro Martinez, G. P. Vayá, H. Kayser, H. Hermansky","doi":"10.1109/SLT.2016.7846244","DOIUrl":"https://doi.org/10.1109/SLT.2016.7846244","url":null,"abstract":"In many applications of machine listening it is useful to know how well an automatic speech recognition system will do before the actual recognition is performed. In this study we investigate different performance measures with the aim of predicting word error rates (WERs) in spatial acoustic scenes in which the type of noise, the signal-to-noise ratio, parameters for spatial filtering, and the amount of reverberation are varied. All measures under consideration are based on phoneme posteriorgrams obtained from a deep neural net. While frame-wise entropy exhibits only medium predictive power for factors other than additive noise, we found the medium temporal distance between posterior vectors (M-Measure) as well as matched phoneme filters (MaP) to exhibit excellent correlations with WER across all conditions. Since our results were obtained with simulated behind-the-ear hearing aid signals, we discuss possible applications for speech-aware hearing devices.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130603374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
LIUM ASR systems for the 2016 Multi-Genre Broadcast Arabic challenge 2016年多类型广播阿拉伯语挑战赛LIUM ASR系统
2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846278
N. Tomashenko, Kevin Vythelingum, Anthony Rousseau, Y. Estève
{"title":"LIUM ASR systems for the 2016 Multi-Genre Broadcast Arabic challenge","authors":"N. Tomashenko, Kevin Vythelingum, Anthony Rousseau, Y. Estève","doi":"10.1109/SLT.2016.7846278","DOIUrl":"https://doi.org/10.1109/SLT.2016.7846278","url":null,"abstract":"This paper describes the automatic speech recognition (ASR) systems developed by LIUM in the framework of the 2016 Multi-Genre Broadcast (MGB-2) Challenge in the Arabic language. LIUM participated in the first of the two proposed tasks, namely the speech-to-text transcription of Aljazeera recordings. We present the approaches and details found in our systems, as well as our results in the evaluation campaign: the primary LIUM ASR system attained the second position. The main aspects come from the use of GMM-derived features for training a DNN, combined with the use of time-delay neural networks for acoustic models, the use of two different approaches in order to automatically phonetize Arabic words, and finally, the training data selection strategy for acoustic and language models.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122408492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Multilingual BLSTM and speaker-specific vector adaptation in 2016 but babel system 2016年多语言BLSTM和特定说话人的载体适应,但巴别塔系统
2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846330
M. Karafiát, M. Baskar, P. Matejka, Karel Veselý, F. Grézl, J. Černocký
{"title":"Multilingual BLSTM and speaker-specific vector adaptation in 2016 but babel system","authors":"M. Karafiát, M. Baskar, P. Matejka, Karel Veselý, F. Grézl, J. Černocký","doi":"10.1109/SLT.2016.7846330","DOIUrl":"https://doi.org/10.1109/SLT.2016.7846330","url":null,"abstract":"This paper provides an extensive summary of BUT 2016 system for the last IARPA Babel evaluations. It concentrates on multi-lingual training of both deep neural network (DNN)-based feature extraction and acoustic models including multilingual training of bidirectional Long Short Term memory networks. Next, two low-dimensional vector approaches to speaker adaptation are investigated: i-vectors and sequence-summarizing neural networks (SSNN). The results provided on three Babel Year 4 languages show clear advantage of both approaches in case limited amount of training data is available. The time necessary for the development of a new system is addressed too, as some of the investigated techniques do not require extensive re-training of the whole system.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115710553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Unsupervised k-means clustering based out-of-set candidate selection for robust open-set language recognition 基于无监督k均值聚类的鲁棒开集语言识别的集外候选选择
2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846284
Qian Zhang, J. Hansen
{"title":"Unsupervised k-means clustering based out-of-set candidate selection for robust open-set language recognition","authors":"Qian Zhang, J. Hansen","doi":"10.1109/SLT.2016.7846284","DOIUrl":"https://doi.org/10.1109/SLT.2016.7846284","url":null,"abstract":"Research in open-set language identification (LID) generally focuses more on accurate in-set language modeling versus improved out-of-set (OOS) language rejection. The main reason for this is the increased cost/resources in collecting sufficient OOS data, versus the in-set languages of interest. Therefore, unknown or OOS language rejection is a challenge. To address this through efficient data collection, we propose a flexible OOS candidate selection method for universal OOS language coverage. Since state-of-the-art i-vector system followed by generative Gaussian back-end achieves effective performance for LID, the selected K candidates are expected to be general enough to represent the entire OOS language space. Therefore, an unsupervised k-means clustering approach is proposed for effective OOS candidate selection. This method is evaluated on a dataset derived from a large-scale corpus (LRE-09) which contains 40 languages. With the proposed selection method, the total OOS training diversity can be reduced by 89% and still achieve better performance on both OOS rejection and overall classification. The proposed method also shows clear benefits for greater data enhancement. Therefore, the proposed solution achieves sustained performance with the advantage of employing a minimum number of OOS language candidates efficiently.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122079642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
F0 transformation techniques for statistical voice conversion with direct waveform modification with spectral differential F0变换技术的统计语音转换与直接波形修改与频谱微分
2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846338
Kazuhiro Kobayashi, T. Toda, Satoshi Nakamura
{"title":"F0 transformation techniques for statistical voice conversion with direct waveform modification with spectral differential","authors":"Kazuhiro Kobayashi, T. Toda, Satoshi Nakamura","doi":"10.1109/SLT.2016.7846338","DOIUrl":"https://doi.org/10.1109/SLT.2016.7846338","url":null,"abstract":"This paper presents several F0 transformation techniques for statistical voice conversion (VC) with direct waveform modification with spectral differential (DIFFVC). Statistical VC is a technique to convert speaker identity of a source speaker's voice into that of a target speaker by converting several acoustic features, such as spectral and excitation features. This technique usually uses vocoder to generate converted speech waveforms from the converted acoustic features. However, the use of vocoder often causes speech quality degradation of the converted voice owing to insufficient parameterization accuracy. To avoid this issue, we have proposed a direct waveform modification technique based on spectral differential filtering and have successfully applied it to intra-gender singing VC (DIFFSVC) where excitation features are not necessary converted. Moreover, we have also applied it to cross-gender singing VC by implementing F0 transformation with a constant rate such as one octave increase or decrease. On the other hand, it is not straightforward to apply the DIFFSVC framework to normal speech conversion because the F0 transformation ratio widely varies depending on a combination of the source and target speakers. In this paper, we propose several F0 transformation techniques for DIFFVC and compare their performance in terms of speech quality of the converted voice and conversion accuracy of speaker individuality. The experimental results demonstrate that the F0 transformation technique based on waveform modification achieves the best performance among the proposed techniques.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132188743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Analysis of the DNN-based SRE systems in multi-language conditions 多语言条件下基于dnn的SRE系统分析
2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI: 10.1109/SLT.2016.7846265
Ondrej Novotný, P. Matejka, O. Glembek, Oldrich Plchot, F. Grézl, L. Burget, J. Černocký
{"title":"Analysis of the DNN-based SRE systems in multi-language conditions","authors":"Ondrej Novotný, P. Matejka, O. Glembek, Oldrich Plchot, F. Grézl, L. Burget, J. Černocký","doi":"10.1109/SLT.2016.7846265","DOIUrl":"https://doi.org/10.1109/SLT.2016.7846265","url":null,"abstract":"This paper analyzes the behavior of our state-of-the-art Deep Neural Network/i-vector/PLDA-based speaker recognition systems in multi-language conditions. On the “Language Pack” of the PRISM set, we evaluate the systems' performance using the NIST's standard metrics. We show that not only the gain from using DNNs vanishes, nor using dedicated DNNs for target conditions helps, but also the DNN-based systems tend to produce de-calibrated scores under the studied conditions. This work gives suggestions for directions of future research rather than any particular solutions to these issues.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133081611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信