Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding最新文献

筛选
英文 中文
Multimodal embedding fusion for robust speaker role recognition in video broadcast 视频广播中多模态嵌入融合的鲁棒说话人角色识别
Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2015-01-01 DOI: 10.1109/ASRU.2015.7404820
Mickael Rouvier, Sebastien Delecraz, Benoit Favre, Meriem Bendris, Frédéric Béchet
{"title":"Multimodal embedding fusion for robust speaker role recognition in video broadcast","authors":"Mickael Rouvier, Sebastien Delecraz, Benoit Favre, Meriem Bendris, Frédéric Béchet","doi":"10.1109/ASRU.2015.7404820","DOIUrl":"https://doi.org/10.1109/ASRU.2015.7404820","url":null,"abstract":"Person role recognition in video broadcasts consists in classifying people into roles such as anchor, journalist, guest, etc. Existing approaches mostly consider one modality, either audio (speaker role recognition) or image (shot role recognition), firstly because of the non-synchrony between both modalities, and secondly because of the lack of a video corpus annotated in both modalities. Deep Neural Networks (DNN) approaches offer the ability to learn simultaneously feature representations (embeddings) and classification functions. This paper presents a multimodal fusion of audio, text and image embeddings spaces for speaker role recognition in asynchronous data. Monomodal embeddings are trained on exogenous data and fine-tuned using a DNN on 70 hours of French Broadcasts corpus for the target task. Experiments on the REPERE corpus show the benefit of the embeddings level fusion compared to the monomodal embeddings systems and to the standard late fusion method.","PeriodicalId":89617,"journal":{"name":"Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"62 1","pages":"383-389"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83048215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Spoken dialogue systems: Challenges, and opportunities for research 口语对话系统:研究的挑战与机遇
Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5372951
J. Williams
{"title":"Spoken dialogue systems: Challenges, and opportunities for research","authors":"J. Williams","doi":"10.1109/ASRU.2009.5372951","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5372951","url":null,"abstract":"Research into spoken dialog systems has yielded some interesting results recently, such as statistical models for improved robustness, and machine learning for optimal control, among others. What are the basic ideas behind these techniques? What opportunities do they exploit? Are they ready to be deployed in real systems? What remains to be done?","PeriodicalId":89617,"journal":{"name":"Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"2 1","pages":"25"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81731441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Trends and challenges in language modeling for speech recognition and machine translation 语音识别和机器翻译语言建模的趋势和挑战
Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373531
Holger Schwenk
{"title":"Trends and challenges in language modeling for speech recognition and machine translation","authors":"Holger Schwenk","doi":"10.1109/ASRU.2009.5373531","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373531","url":null,"abstract":"Language models play an important role in large vocabulary continuous speech recognition (LVCSR) systems and statistical approaches to machine translation (SMT), in particular when modeling morphologically rich languages. Despite intensive research over more than 20 years, state-of-the-art LVCSR and SMT systems seem to use only one dominant approach: n-gram back-off language models. This talk first reviews the most important approaches to language modeling. I then discuss some of the recent trends and challenges for the future. An interesting alternative to the back-off n-gram approach are the so-called continuous space methods. The basic idea is to perform the probability estimation in a continuous space. By these means better probability estimations of unseen word sequences can be expected. There is also a relative large body of works on adaptive language models. The adaptation can aim to tailor a language model to a particular task or domain, or it can be performed over time. Another very active research area are discriminative language models. Finally, I will review the challenges and benefits of language models trained an very large amounts of training material.","PeriodicalId":89617,"journal":{"name":"Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"19 1","pages":"23"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77094906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Audio-visual automatic speech recognition and related bimodal speech technologies: A review of the state-of-the-art and open problems 视听自动语音识别和相关的双峰语音技术:最新的和开放的问题的回顾
Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373530
G. Potamianos
{"title":"Audio-visual automatic speech recognition and related bimodal speech technologies: A review of the state-of-the-art and open problems","authors":"G. Potamianos","doi":"10.1109/ASRU.2009.5373530","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373530","url":null,"abstract":"The presentation will provide an overview of the main research achievements and the state-of-the-art in the area of audiovisual speech processing, mainly focusing in the area of audio-visual automatic speech recognition. The topic has been of interest in the speech research community due to the potential of increased robustness to acoustic noise that the visual modality holds. Nevertheless, significant challenges remain that have hindered practical applications of the technology most notably difficulties with visual speech information extraction and audio-visual fusion algorithms that remain robust to the audio-visual environment variability inherent in practical, unconstrained interaction scenarios and audio-visual data sources, for example multiparty interaction in smart spaces, broadcast news, etc. These challenges are also shared across a number of interesting audio-visual speech technologies beyond the core speech recognition problem, where the visual modality has the potential to resolve ambiguity inherent in the audio signal alone; for example, speech activity detection, speaker diarization, and source separation.","PeriodicalId":89617,"journal":{"name":"Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"21 1","pages":"22"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85793203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Rapid language adaptation tools for multilingual speech processing 用于多语言语音处理的快速语言适应工具
Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373503
Tanja Schultz
{"title":"Rapid language adaptation tools for multilingual speech processing","authors":"Tanja Schultz","doi":"10.1109/ASRU.2009.5373503","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373503","url":null,"abstract":"The performance of speech and language processing technologies has improved dramatically over the past decade, with an increasing number of systems being deployed in a large variety of applications, such as spoken dialog systems, speech summarization and information retrieval systems, and speech translation systems. Most efforts to date were focused on a very small number of languages with large number of speakers, economic potential, and information technology needs of the population. However, speech technology has a lot to contribute even to those languages that do not fall into this category. Languages with a small number of speakers and few linguistic resources may suddenly become of interest for humanitarian and military reasons. Furthermore, a large number of languages are in danger of becoming extinct, and ongoing projects for preserving them could benefit from speech technology. With more than 6900 languages in the world and the need to support multiple input and output languages, the most important challenge today is to port speech processing systems to new languages rapidly and at reasonable costs. In my talk I will introduce state-of-the-art techniques for rapid language adaptation and present solutions to overcome the ever-existing problem of data sparseness and the gap between language and technology expertise. I will describe the building process for speech and language processing components for new unsupported languages and introduce tools to do this rapidly and at lost costs. I describe the Rapid Language Adaptation Tools (RLAT) which built on existing projects like SPICE, GlobalPhone, and FestVox and enable users to develop speech processing components, to collect appropriate speech and text data for building and improving these components, and to evaluate the results allowing for iterative improvements.","PeriodicalId":89617,"journal":{"name":"Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"103 1","pages":"51"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86898494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Acoustic modelling for speech recognition: Hidden Markov models and beyond? 语音识别的声学建模:隐马尔可夫模型和超越?
Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5372953
M. Gales
{"title":"Acoustic modelling for speech recognition: Hidden Markov models and beyond?","authors":"M. Gales","doi":"10.1109/ASRU.2009.5372953","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5372953","url":null,"abstract":"Hidden Markov models (HMMs) are still the dominant form of acoustic model used in automatic speech recognition (ASR) systems. However over the years the form, and training, of the HMM for ASR have been extended and modified, so that the current forms used in state-of-the-art speech recognition systems are very different to those originally proposed thirty years ago. This talk will review two of the more important extensions that have been proposed over the years: discriminative training; and speaker and environment adaptation. The use of discriminative training is now common with forms based on minimum Bayes' training and minimum classification error being applied to systems trained on many hundreds of hours of speech data. The talk will describe these current approaches, as well as discussing the current trends towards schemes based on large-margin training approaches. Linear transform based speaker adaptation is the dominant form for speaker adaptation. Current approaches, including extensions to linear transforms and model-based noise robustness techniques, and trends will also be described. Details of the various forms of the adaptation/noise transformation, training criterion and approaches for adaptive training will be given. The final part of the talk will discuss research beyond the current HMM framework. Schemes based on both discriminative models and functions, as well as non-parametric approaches will be described.","PeriodicalId":89617,"journal":{"name":"Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"135 1","pages":"44"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86363365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Online discriminative learning: theory and applications 在线判别学习:理论与应用
Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373501
N. Cesa-Bianchi
{"title":"Online discriminative learning: theory and applications","authors":"N. Cesa-Bianchi","doi":"10.1109/ASRU.2009.5373501","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373501","url":null,"abstract":"Online discriminative learning has been successfully applied to various speech and natural language processing tasks, including classification, parsing, translation and speech recognition/generation. In addition to their simplicity and scalability, online learning algorithms are natural tools in applications involving human-computer interaction, such as computer-assisted translation. In this talk we describe some of the most popular online learning algorithms, and mention their connection with the solution of convex optimization problems. In order to cope with problems where the human feedback comes at a cost, we also illustrate some simple techniques for designing online algorithms that work in semi-supervised mode (active learning). We then discuss the game-theoretic nature of online performance analysis, which explains the robustness to noise exhibited by these algorithms. Finally, we mention some of the latest research developments and future challenges in the online research domain.","PeriodicalId":89617,"journal":{"name":"Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"14 1","pages":"45"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88039911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
It's not you, it's me: Automatically extracting social meaning from speed dates 不是你的问题,是我的问题:从速配约会中自动提取社交意义
Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373494
Dan Jurafsky
{"title":"It's not you, it's me: Automatically extracting social meaning from speed dates","authors":"Dan Jurafsky","doi":"10.1109/ASRU.2009.5373494","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373494","url":null,"abstract":"Automatically detecting human social intentions from spoken conversation is an important task for social computing and for dialogue systems. We describe a system for detecting elements of interactional style: whether a speaker is awkward, friendly, or flirtatious. We create and use a new spoken corpus of 991 4-minute speed-dates. Participants rated themselves and each other for these elements of style. Using rich dialogue, lexical, and prosodic features, we are able to detect flirtatious, awkward, and friendly styles in noisy natural conversational data with above 70% accuracy, significantly outperforming not only the baseline but also, for flirtation, outperforming the human interlocutors. We find that features like pitch, energy, and the use of emotional vocabulary help detect flirtation, collaborative conversational style (laughter, questions, collaborative completions) help in detecting friendliness, and disfluencies help in detecting awkwardness. In analyzing why our system outperforms humans, we show that humans are very poor perceivers of flirtatiousness in this task, and instead often project their own intended behavior onto their interlocutors. This talk describes joint work with Dan McFarland (School of Education) and Rajesh Ranganath (Computer Science Department).","PeriodicalId":89617,"journal":{"name":"Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"1 1","pages":"11"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75609740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Supervised and Unsupervised Feature Selection for Inferring Social Nature of Telephone Conversations from Their Content. 有监督和无监督特征选择从电话交谈内容推断其社交性质。
Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2008-04-03 Epub Date: 2003-10-13 DOI: 10.1109/ICCV.2003.1238369
Anthony Stark, Izhak Shafran, Jeffrey Kaye
{"title":"Supervised and Unsupervised Feature Selection for Inferring Social Nature of Telephone Conversations from Their Content.","authors":"Anthony Stark, Izhak Shafran, Jeffrey Kaye","doi":"10.1109/ICCV.2003.1238369","DOIUrl":"10.1109/ICCV.2003.1238369","url":null,"abstract":"<p><p>The ability to reliably infer the nature of telephone conversations opens up a variety of applications, ranging from designing context-sensitive user interfaces on smartphones, to providing new tools for social psychologists and social scientists to study and understand social life of different subpopulations within different contexts. Using a unique corpus of everyday telephone conversations collected from eight residences over the duration of a year, we investigate the utility of popular features, extracted solely from the content, in classifying business-oriented calls from others. Through feature selection experiments, we find that the discrimination can be performed robustly for a majority of the calls using a small set of features. Remarkably, features learned from unsupervised methods, specifically latent Dirichlet allocation, perform almost as well as with as those from supervised methods. The unsupervised clusters learned in this task shows promise of finer grain inference of social nature of telephone conversations.</p>","PeriodicalId":89617,"journal":{"name":"Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"1 ","pages":"378-384"},"PeriodicalIF":0.0,"publicationDate":"2008-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3384521/pdf/nihms-349265.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30735213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spoken language understanding: a survey 口语理解:一项调查
Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430139
R. Mori
{"title":"Spoken language understanding: a survey","authors":"R. Mori","doi":"10.1109/ASRU.2007.4430139","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430139","url":null,"abstract":"A survey of research on spoken language understanding is presented. It covers aspects of knowledge representation, automatic interpretation strategies, semantic grammars, conceptual language models, semantic event detection, shallow semantic parsing, semantic classification, semantic confidence, active learning.","PeriodicalId":89617,"journal":{"name":"Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"1 1","pages":"365-376"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77243289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信