2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)最新文献_第9页

Personalized word representations carrying personalized semantics learned from social network posts 从社交网络帖子中学习个性化语义的个性化词语表示

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2017-10-29 DOI: 10.1109/ASRU.2017.8268982

Zih-Wei Lin, Tzu-Wei Sung, Hung-yi Lee, Lin-Shan Lee

{"title":"Personalized word representations carrying personalized semantics learned from social network posts","authors":"Zih-Wei Lin, Tzu-Wei Sung, Hung-yi Lee, Lin-Shan Lee","doi":"10.1109/ASRU.2017.8268982","DOIUrl":"https://doi.org/10.1109/ASRU.2017.8268982","url":null,"abstract":"Distributed word representations have been shown to be very useful in various natural language processing (NLP) application tasks. These word vectors learned from huge corpora very often carry both semantic and syntactic information of words. However, it is well known that each individual user has his own language patterns because of different factors such as interested topics, friend groups, social activities, wording habits, etc., which may imply some kind of personalized semantics. With such personalized semantics, the same word may imply slightly differently for different users. For example, the word “Cappuccino” may imply “Leisure”, “Joy”, “Excellent” for a user enjoying coffee, by only a kind of drink for someone else. Such personalized semantics of course cannot be carried by the standard universal word vectors trained with huge corpora produced by many people. In this paper, we propose a framework to train different personalized word vectors for different users based on the very successful continuous skip-gram model using the social network data posted by many individual users. In this framework, universal background word vectors are first learned from the background corpora, and then adapted by the personalized corpus for each individual user to learn the personalized word vectors. We use two application tasks to evaluate the quality of the personalized word vectors obtained in this way, the user prediction task and the sentence completion task. These personalized word vectors were shown to carry some personalized semantics and offer improved performance on these two evaluation tasks.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127318926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Streaming small-footprint keyword spotting using sequence-to-sequence models 流式传输使用序列到序列模型的小占用关键字定位

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2017-10-26 DOI: 10.1109/ASRU.2017.8268974

Yanzhang He, Rohit Prabhavalkar, Kanishka Rao, Wei Li, A. Bakhtin, Ian McGraw

引用次数: 77

Spoken language biomarkers for detecting cognitive impairment 用于检测认知障碍的口语生物标志物

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2017-10-20 DOI: 10.1109/ASRU.2017.8268965

Tuka Alhanai, R. Au, James R. Glass

引用次数: 33

Multitask training with unlabeled data for end-to-end sign language fingerspelling recognition 端到端手语拼写识别的无标记数据多任务训练

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2017-10-09 DOI: 10.1109/ASRU.2017.8268962

Bowen Shi, Karen Livescu

引用次数: 13

Dynamic time-aware attention to speaker roles and contexts for spoken language understanding 对说话者角色和语境的动态时间意识关注有助于口语理解

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2017-09-30 DOI: 10.1109/ASRU.2017.8268985

Po-Chun Chen, Ta-Chung Chi, Shang-Yu Su, Yun-Nung (Vivian) Chen

{"title":"Dynamic time-aware attention to speaker roles and contexts for spoken language understanding","authors":"Po-Chun Chen, Ta-Chung Chi, Shang-Yu Su, Yun-Nung (Vivian) Chen","doi":"10.1109/ASRU.2017.8268985","DOIUrl":"https://doi.org/10.1109/ASRU.2017.8268985","url":null,"abstract":"Spoken language understanding (SLU) is an essential component in conversational systems. Most SLU component treats each utterance independently, and then the following components aggregate the multi-turn information in the separate phases. In order to avoid error propagation and effectively utilize contexts, prior work leveraged history for contextual SLU. However, the previous model only paid attention to the content in history utterances without considering their temporal information and speaker roles. In the dialogues, the most recent utterances should be more important than the least recent ones. Furthermore, users usually pay attention to 1) self history for reasoning and 2) others utterances for listening, the speaker of the utterances may provides informative cues to help understanding. Therefore, this paper proposes an attention-based network that additionally leverages temporal information and speaker role for better SLU, where the attention to contexts and speaker roles can be automatically learned in an end-to-end manner. The experiments on the benchmark Dialogue State Tracking Challenge 4 (DSTC4) dataset show that the time-aware dynamic role attention networks significantly improve the understanding performance1.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"133 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116578745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

UTD-CRSS submission for MGB-3 Arabic dialect identification: Front-end and back-end advancements on broadcast speech MGB-3阿拉伯方言识别的UTD-CRSS提交:广播语音的前端和后端进展

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2017-09-29 DOI: 10.1109/ASRU.2017.8268958

A. Bulut, Qian Zhang, Chunlei Zhang, F. Bahmaninezhad, J. Hansen

{"title":"UTD-CRSS submission for MGB-3 Arabic dialect identification: Front-end and back-end advancements on broadcast speech","authors":"A. Bulut, Qian Zhang, Chunlei Zhang, F. Bahmaninezhad, J. Hansen","doi":"10.1109/ASRU.2017.8268958","DOIUrl":"https://doi.org/10.1109/ASRU.2017.8268958","url":null,"abstract":"This study presents systems submitted by the University of Texas at Dallas, Center for Robust Speech Systems (UTD-CRSS) to the MGB-3 Arabic Dialect Identification (ADI) subtask. This task is defined to discriminate between five dialects of Arabic, including Egyptian, Gulf, Levantine, North African, and Modern Standard Arabic. We develop multiple single systems with different front-end representations and back-end classifiers. At the front-end level, feature extraction methods such as Mel-frequency cepstral coefficients (MFCCs) and two types of bottleneck features (BNF) are studied for an i-Vector framework. As for the back-end level, Gaussian back-end (GB), and Generative Adversarial Networks (GANs) classifiers are applied alternately. The best submission (contrastive) is achieved for the ADI subtask with an accuracy of 76.94% by augmenting the randomly chosen part of the development dataset. Further, with a post evaluation correction in the submitted system, final accuracy is increased to 79.76%, which represents the best performance achieved so far for the challenge on the test dataset.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132481681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Attention-based Wav2Text with feature transfer learning 基于注意力的Wav2Text与特征迁移学习

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2017-09-22 DOI: 10.1109/ASRU.2017.8268951

Andros Tjandra, S. Sakti, Satoshi Nakamura

{"title":"Attention-based Wav2Text with feature transfer learning","authors":"Andros Tjandra, S. Sakti, Satoshi Nakamura","doi":"10.1109/ASRU.2017.8268951","DOIUrl":"https://doi.org/10.1109/ASRU.2017.8268951","url":null,"abstract":"Conventional automatic speech recognition (ASR) typically performs multi-level pattern recognition tasks that map the acoustic speech waveform into a hierarchy of speech units. But, it is widely known that information loss in the earlier stage can propagate through the later stages. After the resurgence of deep learning, interest has emerged in the possibility of developing a purely end-to-end ASR system from the raw waveform to the transcription without any predefined alignments and hand-engineered models. However, the successful attempts in end-to-end architecture still used spectral-based features, while the successful attempts in using raw waveform were still based on the hybrid deep neural network — Hidden Markov model (DNN-HMM) framework. In this paper, we construct the first end-to-end attention-based encoder-decoder model to process directly from raw speech waveform to the text transcription. We called the model as Attention-based Wav2Text. To assist the training process of the end-to-end model, we propose to utilize a feature transfer learning. Experimental results also reveal that the proposed Attention-based Wav2Text model directly with raw waveform could achieve a better result in comparison with the attentional encoder-decoder model trained on standard front-end filterbank features.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128635662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Mitigating the impact of speech recognition errors on chatbot using sequence-to-sequence model 利用序列对序列模型减轻语音识别错误对聊天机器人的影响

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2017-09-22 DOI: 10.1109/ASRU.2017.8268977

Pin-Jung Chen, I-Hung Hsu, Yi Yao Huang, Hung-yi Lee

引用次数: 15

Speech recognition challenge in the wild: Arabic MGB-3 野外语音识别的挑战:阿拉伯语MGB-3

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2017-09-21 DOI: 10.1109/ASRU.2017.8268952

Ahmed M. Ali, S. Vogel, S. Renals

{"title":"Speech recognition challenge in the wild: Arabic MGB-3","authors":"Ahmed M. Ali, S. Vogel, S. Renals","doi":"10.1109/ASRU.2017.8268952","DOIUrl":"https://doi.org/10.1109/ASRU.2017.8268952","url":null,"abstract":"This paper describes the Arabic MGB-3 Challenge — Arabic Speech Recognition in the Wild. Unlike last year's Arabic MGB-2 Challenge, for which the recognition task was based on more than 1,200 hours broadcast TV news recordings from Aljazeera Arabic TV programs, MGB-3 emphasises dialectal Arabic using a multi-genre collection of Egyptian YouTube videos. Seven genres were used for the data collection: comedy, cooking, family/kids, fashion, drama, sports, and science (TEDx). A total of 16 hours of videos, split evenly across the different genres, were divided into adaptation, development and evaluation data sets. The Arabic MGB-Challenge comprised two tasks: A) Speech transcription, evaluated on the MGB-3 test set, along with the 10 hour MGB-2 test set to report progress on the MGB-2 evaluation; B) Arabic dialect identification, introduced this year in order to distinguish between four major Arabic dialects — Egyptian, Levantine, North African, Gulf, as well as Modern Standard Arabic. Two hours of audio per dialect were released for development and a further two hours were used for evaluation. For dialect identification, both lexical features and i-vector bottleneck features were shared with participants in addition to the raw audio recordings. Overall, thirteen teams submitted ten systems to the challenge. We outline the approaches adopted in each system, and summarise the evaluation results.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124944101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 92

WERD: Using social text spelling variants for evaluating dialectal speech recognition 使用社会文本拼写变体来评估方言语音识别

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2017-09-21 DOI: 10.1109/ASRU.2017.8268928

Ahmed M. Ali, Preslav Nakov, P. Bell, S. Renals

{"title":"WERD: Using social text spelling variants for evaluating dialectal speech recognition","authors":"Ahmed M. Ali, Preslav Nakov, P. Bell, S. Renals","doi":"10.1109/ASRU.2017.8268928","DOIUrl":"https://doi.org/10.1109/ASRU.2017.8268928","url":null,"abstract":"We study the problem of evaluating automatic speech recognition (ASR) systems that target dialectal speech input. A major challenge in this case is that the orthography of dialects is typically not standardized. From an ASR evaluation perspective, this means that there is no clear gold standard for the expected output, and several possible outputs could be considered correct according to different human annotators, which makes standard word error rate (WER) inadequate as an evaluation metric. Such a situation is typical for machine translation (MT), and thus we borrow ideas from an MT evaluation metric, namely TERp, an extension of translation error rate which is closely-related to WER. In particular, in the process of comparing a hypothesis to a reference, we make use of spelling variants for words and phrases, which we mine from Twitter in an unsupervised fashion. Our experiments with evaluating ASR output for Egyptian Arabic, and further manual analysis, show that the resulting WERd (i.e., WER for dialects) metric, a variant of TERp, is more adequate than WER for evaluating dialectal ASR.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121777371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11