2018 IEEE Spoken Language Technology Workshop (SLT)最新文献

筛选
英文 中文
Analysing The Predictions Of a CNN-Based Replay Spoofing Detection System 基于cnn的重放欺骗检测系统预测分析
2018 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2018-12-01 DOI: 10.1109/SLT.2018.8639666
Bhusan Chettri, Saumitra Mishra, Bob L. Sturm, Emmanouil Benetos
{"title":"Analysing The Predictions Of a CNN-Based Replay Spoofing Detection System","authors":"Bhusan Chettri, Saumitra Mishra, Bob L. Sturm, Emmanouil Benetos","doi":"10.1109/SLT.2018.8639666","DOIUrl":"https://doi.org/10.1109/SLT.2018.8639666","url":null,"abstract":"Playing recorded speech samples of an enrolled speaker – “replay attack” – is a simple approach to bypass an automatic speaker verification (ASV) system. The vulnerability of ASV systems to such attacks has been acknowledged and studied, but there has been no research into what spoofing detection systems are actually learning to discriminate. In this paper, we analyse the local behaviour of a replay spoofing detection system based on convolutional neural networks (CNNs) adapted from a state-of-the-art CNN (LCNNFFT) submitted at the ASVspoof 2017 challenge. We generate temporal and spectral explanations for predictions of the model using the SLIME algorithm. Our findings suggest that in most instances of spoofing the model is using information in the first 400 milliseconds of each audio instance to make the class prediction. Knowledge of the characteristics that spoofing detection systems are exploiting can help build less vulnerable ASV systems, other spoofing detection systems, as well as better evaluation databases1.","PeriodicalId":377307,"journal":{"name":"2018 IEEE Spoken Language Technology Workshop (SLT)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114763688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
End-To-End Named Entity And Semantic Concept Extraction From Speech 端到端命名实体与语音语义概念提取
2018 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2018-12-01 DOI: 10.1109/SLT.2018.8639513
Sahar Ghannay, Antoine Caubrière, Y. Estève, Nathalie Camelin, E. Simonnet, Antoine Laurent, E. Morin
{"title":"End-To-End Named Entity And Semantic Concept Extraction From Speech","authors":"Sahar Ghannay, Antoine Caubrière, Y. Estève, Nathalie Camelin, E. Simonnet, Antoine Laurent, E. Morin","doi":"10.1109/SLT.2018.8639513","DOIUrl":"https://doi.org/10.1109/SLT.2018.8639513","url":null,"abstract":"Named entity recognition (NER) is among SLU tasks that usually extract semantic information from textual documents. Until now, NER from speech is made through a pipeline process that consists in processing first an automatic speech recognition (ASR) on the audio and then processing a NER on the ASR outputs. Such approach has some disadvantages (error propagation, metric to tune ASR systems sub-optimal in regards to the final task, reduced space search at the ASR output level,...) and it is known that more integrated approaches outperform sequential ones, when they can be applied. In this paper, we explore an end-to-end approach that directly extracts named entities from speech, though a unique neural architecture. On a such way, a joint optimization is possible for both ASR and NER. Experiments are carried on French data easily accessible, composed of data distributed in several evaluation campaigns. The results are promising since this end-to-end approach provides similar results (F-measure= 0.66 on test data) than a classical pipeline approach to detect named entity categories (F-measure=0.64). Last, we also explore this approach applied to semantic concept extraction, through a slot filling task known as a spoken language understanding problem, and also observe an improvement in comparison to a pipeline approach.","PeriodicalId":377307,"journal":{"name":"2018 IEEE Spoken Language Technology Workshop (SLT)","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117299283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 75
An Exploration of Directly Using Word as ACOUSTIC Modeling Unit for Speech Recognition 直接使用词作为语音识别声学建模单元的探索
2018 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2018-12-01 DOI: 10.1109/SLT.2018.8639623
Chunlei Zhang, Chengzhu Yu, Chao Weng, Jia Cui, Dong Yu
{"title":"An Exploration of Directly Using Word as ACOUSTIC Modeling Unit for Speech Recognition","authors":"Chunlei Zhang, Chengzhu Yu, Chao Weng, Jia Cui, Dong Yu","doi":"10.1109/SLT.2018.8639623","DOIUrl":"https://doi.org/10.1109/SLT.2018.8639623","url":null,"abstract":"Conventional acoustic models for automatic speech recognition (ASR) are usually constructed from sub-word unit (e.g., context-dependent phoneme, grapheme, wordpiece etc.). Recent studies demonstrate that connectionist temporal classification (CTC) based acoustic-to-word (A2W) models are also promising for ASR. Such structures have drawn increasing attention as they can directly target words as output units, which simplify ASR pipeline by avoiding additional pronunciation lexicon, or even language model. In this study, we systematically explore to use word as acoustic modeling unit for conversational speech recognition. By replacing senone alignment with word alignment in a convolutional bidirectional LSTM architecture and employing a lexicon-free weighted finite-state transducer (WFST) based decoding, we greatly simplify conventional hybrid speech recognition system. On Hub5-2000 Switchboard/CallHome test sets with 300-hour training data, we achieve a WER that is close to the senone based hybrid systems with a WFST based decoding.","PeriodicalId":377307,"journal":{"name":"2018 IEEE Spoken Language Technology Workshop (SLT)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115864236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Training Speaker Recognition Models with Recording-Level Labels 用录音级标签训练说话人识别模型
2018 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2018-12-01 DOI: 10.1109/slt.2018.8639601
Tanel Alumae
{"title":"Training Speaker Recognition Models with Recording-Level Labels","authors":"Tanel Alumae","doi":"10.1109/slt.2018.8639601","DOIUrl":"https://doi.org/10.1109/slt.2018.8639601","url":null,"abstract":"","PeriodicalId":377307,"journal":{"name":"2018 IEEE Spoken Language Technology Workshop (SLT)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121282217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Investigating Deep Neural Networks for Speaker Diarization in the DIHARD Challenge 在DIHARD挑战中研究深度神经网络对说话人分化的影响
2018 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2018-12-01 DOI: 10.1109/SLT.2018.8639630
Ivan Himawan, M. Rahman, S. Sridharan, C. Fookes, A. Kanagasundaram
{"title":"Investigating Deep Neural Networks for Speaker Diarization in the DIHARD Challenge","authors":"Ivan Himawan, M. Rahman, S. Sridharan, C. Fookes, A. Kanagasundaram","doi":"10.1109/SLT.2018.8639630","DOIUrl":"https://doi.org/10.1109/SLT.2018.8639630","url":null,"abstract":"We investigate the use of deep neural networks (DNNs) for the speaker diarization task to improve performance under domain mismatched conditions. Three unsupervised domain adaptation techniques, namely inter-dataset variability compensation (IDVC), domain-invariant covariance normalization (DICN), and domain mismatch modeling (DMM), are applied on DNN based speaker embeddings to compensate for the mismatch in the embedding subspace. We present results conducted on the DIHARD data, which was released for the 2018 diarization challenge. Collected from a diverse set of domains, this data provides very challenging domain mismatched conditions for the diarization task. Our results provide insights into how the performance of our proposed system could be further improved.","PeriodicalId":377307,"journal":{"name":"2018 IEEE Spoken Language Technology Workshop (SLT)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115770219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Sequence Teacher-Student Training of Acoustic Models for Automatic Free Speaking Language Assessment 自由口语自动评估声学模型的顺序师生训练
2018 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2018-12-01 DOI: 10.1109/SLT.2018.8639557
Yu Wang, J. H. M. Wong, M. Gales, K. Knill, A. Ragni
{"title":"Sequence Teacher-Student Training of Acoustic Models for Automatic Free Speaking Language Assessment","authors":"Yu Wang, J. H. M. Wong, M. Gales, K. Knill, A. Ragni","doi":"10.1109/SLT.2018.8639557","DOIUrl":"https://doi.org/10.1109/SLT.2018.8639557","url":null,"abstract":"A high performance automatic speech recognition (ASR) system is an important constituent component of an automatic language assessment system for free speaking language tests. The ASR system is required to be capable of recognising non-native spontaneous English speech and to be deployable under real-time conditions. The performance of ASR systems can often be significantly improved by leveraging upon multiple systems that are complementary, such as an ensemble. Ensemble methods, however, can be computationally expensive, often requiring multiple decoding runs, which makes them impractical for deployment. In this paper, a lattice-free implementation of sequence-level teacher-student training is used to reduce this computational cost, thereby allowing for real-time applications. This method allows a single student model to emulate the performance of an ensemble of teachers, but without the need for multiple decoding runs. Adaptations of the student model to speakers from different first languages (L1s) and grades are also explored.","PeriodicalId":377307,"journal":{"name":"2018 IEEE Spoken Language Technology Workshop (SLT)","volume":"78 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114106196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Investigating the Downstream Impact of Grapheme-Based Acoustic Modeling on Spoken Utterance Classification 基于石墨烯的声学建模对语音分类的影响研究
2018 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2018-12-01 DOI: 10.1109/SLT.2018.8639549
Ryan Price, Bhargav Srinivas Ch, Surbhi Singhal, S. Bangalore
{"title":"Investigating the Downstream Impact of Grapheme-Based Acoustic Modeling on Spoken Utterance Classification","authors":"Ryan Price, Bhargav Srinivas Ch, Surbhi Singhal, S. Bangalore","doi":"10.1109/SLT.2018.8639549","DOIUrl":"https://doi.org/10.1109/SLT.2018.8639549","url":null,"abstract":"Automatic speech recognition (ASR) and natural language understanding are critical components of spoken language understanding (SLU) systems. One obstacle to providing services with SLU systems in multiple languages is the cost associated with acquiring all of the language-specific resources required for ASR in each language. Modeling graphemes eliminates the need to obtain a pronunciation dictionary which maps from speech sounds to words and is one way to reduce ASR resource dependencies when rapidly developing ASR in new languages. However, little is known about the downstream impact on SLU task performance when selecting graphemes as the acoustic modeling unit. This work investigates acoustic modeling for the ASR component of an SLU system using grapheme-based approaches together with convolutional and recurrent neural network architectures. We evaluate both ASR word accuracy and spoken utterance classification (SUC) accuracy for English, Italian and Spanish language tasks and find that it is possible to achieve SUC accuracy that is comparable to conventional phoneme-based systems which leverage a pronunciation dictionary.","PeriodicalId":377307,"journal":{"name":"2018 IEEE Spoken Language Technology Workshop (SLT)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126624652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Intelligence Is Asking The Right Question: A Study On Japanese Question Generation 智力就是问对问题:日语问题生成研究
2018 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2018-12-01 DOI: 10.1109/SLT.2018.8639526
Lasguido Nio, Koji Murakami
{"title":"Intelligence Is Asking The Right Question: A Study On Japanese Question Generation","authors":"Lasguido Nio, Koji Murakami","doi":"10.1109/SLT.2018.8639526","DOIUrl":"https://doi.org/10.1109/SLT.2018.8639526","url":null,"abstract":"Traditional automatic question generation often requires hand-crafted templates or sophisticated NLP pipelines. Such approaches, however, require extensive labor and expertise to morphologically analyze the sentences and create the NLP framework. Our works aim to simplify these labors. We conduct a contrastive experiment between two types of sequence learning: statistical-based machine translation and attention-based sequence neural network. These models can be trained end-to-end, and it can capture the pattern between the input sequence and output sequence, thus diminishing the need to prepare a sophisticated NLP pipeline. Automatic evaluation results show that our system outperforms the state-of-the-art rule-based system, and also excels in terms of content quality and fluency according to a subjective human test.","PeriodicalId":377307,"journal":{"name":"2018 IEEE Spoken Language Technology Workshop (SLT)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126065692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Optimizing the Quality of Synthetically Generated Pseudowords for the Task of Minimal-Pair Distinction 优化最小对区分任务中合成假词的质量
2018 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2018-12-01 DOI: 10.1109/SLT.2018.8639037
Heiko Holz, Maria Chinkina, Laura Vetter
{"title":"Optimizing the Quality of Synthetically Generated Pseudowords for the Task of Minimal-Pair Distinction","authors":"Heiko Holz, Maria Chinkina, Laura Vetter","doi":"10.1109/SLT.2018.8639037","DOIUrl":"https://doi.org/10.1109/SLT.2018.8639037","url":null,"abstract":"Training the distinction of vowel lengths or learning to differentiate between voiced and voiceless plosive sounds in form of minimal pair differentiation is one of the treatments fostering phonological awareness for people with reading and/or writing disabilities. While text-to-speech systems can automatically generate minimal pairs (e.g., bin and pin), the quality of the pronunciation of pseudowords is not always optimal. We present a novel approach for using text-to-speech tools to artificially generate the pronunciation of German pseudowords, which is evaluated in a crowdsourcing task of the discrimination of minimal pairs. While the input for generating audio files for real words is provided as plaintext, the audio files for pseudowords are generated from the SAMPA transcription, a computer-readable phonetic alphabet, of their real-word counterparts. The task of selecting the correct word from a minimal pair of a pseudoword and its lexical counterpart was completed equally successfully when a pseudoword was generated by our method or pronounced by a human (χ2(1) = 2.43, p = .119).","PeriodicalId":377307,"journal":{"name":"2018 IEEE Spoken Language Technology Workshop (SLT)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126796106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring End-To-End Attention-Based Neural Networks For Native Language Identification 探索端到端基于注意力的神经网络用于母语识别
2018 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2018-12-01 DOI: 10.1109/SLT.2018.8639689
Rutuja Ubale, Yao Qian, Keelan Evanini
{"title":"Exploring End-To-End Attention-Based Neural Networks For Native Language Identification","authors":"Rutuja Ubale, Yao Qian, Keelan Evanini","doi":"10.1109/SLT.2018.8639689","DOIUrl":"https://doi.org/10.1109/SLT.2018.8639689","url":null,"abstract":"Automatic identification of speakers’ native language (L1) based on their speech in a second language (L2) is a challenging research problem that can aid several spoken language technologies such as automatic speech recognition (ASR), speaker recognition, and voice biometrics in interactive voice applications. End-to-end learning, in which the features and the classification model are learned jointly in a single system, is an emerging field in the areas of speech recognition, speaker verification and spoken language understanding. In this paper, we present our study on attention-based end-to-end modeling for native language identification on a database of 11 different L1s. Using this methodology, we can determine the native language of the speaker directly from the raw acoustic features. Experimental results from our study show that our best end-to-end model can achieve promising results by capturing speech commonalities across L1s using an attention mechanism. In addition, fusion of proposed systems with the baseline system leads to significant performance improvements.","PeriodicalId":377307,"journal":{"name":"2018 IEEE Spoken Language Technology Workshop (SLT)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121568745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信