Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding最新文献

UNCONSTRAINED DYSFLUENCY MODELING FOR DYSFLUENT SPEECH TRANSCRIPTION AND DETECTION. 无约束语言障碍模型的语音转录和检测。

Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2023-12-01 DOI: 10.1109/asru57964.2023.10389771

Jiachen Lian, Carly Feng, Naasir Farooqi, Steve Li, Anshul Kashyap, Cheol Jun Cho, Peter Wu, Robbie Netzorg, Tingle Li, Gopala Krishna Anumanchipalli

引用次数: 0

Multimodal embedding fusion for robust speaker role recognition in video broadcast 视频广播中多模态嵌入融合的鲁棒说话人角色识别

Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2015-01-01 DOI: 10.1109/ASRU.2015.7404820

Mickael Rouvier, Sebastien Delecraz, Benoit Favre, Meriem Bendris, Frédéric Béchet

引用次数: 4

Spoken dialogue systems: Challenges, and opportunities for research 口语对话系统:研究的挑战与机遇

Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5372951

J. Williams

引用次数: 25

Trends and challenges in language modeling for speech recognition and machine translation 语音识别和机器翻译语言建模的趋势和挑战

Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373531

Holger Schwenk

{"title":"Trends and challenges in language modeling for speech recognition and machine translation","authors":"Holger Schwenk","doi":"10.1109/ASRU.2009.5373531","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373531","url":null,"abstract":"Language models play an important role in large vocabulary continuous speech recognition (LVCSR) systems and statistical approaches to machine translation (SMT), in particular when modeling morphologically rich languages. Despite intensive research over more than 20 years, state-of-the-art LVCSR and SMT systems seem to use only one dominant approach: n-gram back-off language models. This talk first reviews the most important approaches to language modeling. I then discuss some of the recent trends and challenges for the future. An interesting alternative to the back-off n-gram approach are the so-called continuous space methods. The basic idea is to perform the probability estimation in a continuous space. By these means better probability estimations of unseen word sequences can be expected. There is also a relative large body of works on adaptive language models. The adaptation can aim to tailor a language model to a particular task or domain, or it can be performed over time. Another very active research area are discriminative language models. Finally, I will review the challenges and benefits of language models trained an very large amounts of training material.","PeriodicalId":89617,"journal":{"name":"Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"19 1","pages":"23"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77094906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Audio-visual automatic speech recognition and related bimodal speech technologies: A review of the state-of-the-art and open problems 视听自动语音识别和相关的双峰语音技术:最新的和开放的问题的回顾

Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373530

G. Potamianos

{"title":"Audio-visual automatic speech recognition and related bimodal speech technologies: A review of the state-of-the-art and open problems","authors":"G. Potamianos","doi":"10.1109/ASRU.2009.5373530","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373530","url":null,"abstract":"The presentation will provide an overview of the main research achievements and the state-of-the-art in the area of audiovisual speech processing, mainly focusing in the area of audio-visual automatic speech recognition. The topic has been of interest in the speech research community due to the potential of increased robustness to acoustic noise that the visual modality holds. Nevertheless, significant challenges remain that have hindered practical applications of the technology most notably difficulties with visual speech information extraction and audio-visual fusion algorithms that remain robust to the audio-visual environment variability inherent in practical, unconstrained interaction scenarios and audio-visual data sources, for example multiparty interaction in smart spaces, broadcast news, etc. These challenges are also shared across a number of interesting audio-visual speech technologies beyond the core speech recognition problem, where the visual modality has the potential to resolve ambiguity inherent in the audio signal alone; for example, speech activity detection, speaker diarization, and source separation.","PeriodicalId":89617,"journal":{"name":"Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"21 1","pages":"22"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85793203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Rapid language adaptation tools for multilingual speech processing 用于多语言语音处理的快速语言适应工具

Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373503

Tanja Schultz

{"title":"Rapid language adaptation tools for multilingual speech processing","authors":"Tanja Schultz","doi":"10.1109/ASRU.2009.5373503","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373503","url":null,"abstract":"The performance of speech and language processing technologies has improved dramatically over the past decade, with an increasing number of systems being deployed in a large variety of applications, such as spoken dialog systems, speech summarization and information retrieval systems, and speech translation systems. Most efforts to date were focused on a very small number of languages with large number of speakers, economic potential, and information technology needs of the population. However, speech technology has a lot to contribute even to those languages that do not fall into this category. Languages with a small number of speakers and few linguistic resources may suddenly become of interest for humanitarian and military reasons. Furthermore, a large number of languages are in danger of becoming extinct, and ongoing projects for preserving them could benefit from speech technology. With more than 6900 languages in the world and the need to support multiple input and output languages, the most important challenge today is to port speech processing systems to new languages rapidly and at reasonable costs. In my talk I will introduce state-of-the-art techniques for rapid language adaptation and present solutions to overcome the ever-existing problem of data sparseness and the gap between language and technology expertise. I will describe the building process for speech and language processing components for new unsupported languages and introduce tools to do this rapidly and at lost costs. I describe the Rapid Language Adaptation Tools (RLAT) which built on existing projects like SPICE, GlobalPhone, and FestVox and enable users to develop speech processing components, to collect appropriate speech and text data for building and improving these components, and to evaluate the results allowing for iterative improvements.","PeriodicalId":89617,"journal":{"name":"Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"103 1","pages":"51"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86898494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Acoustic modelling for speech recognition: Hidden Markov models and beyond? 语音识别的声学建模:隐马尔可夫模型和超越?

Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5372953

M. Gales

{"title":"Acoustic modelling for speech recognition: Hidden Markov models and beyond?","authors":"M. Gales","doi":"10.1109/ASRU.2009.5372953","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5372953","url":null,"abstract":"Hidden Markov models (HMMs) are still the dominant form of acoustic model used in automatic speech recognition (ASR) systems. However over the years the form, and training, of the HMM for ASR have been extended and modified, so that the current forms used in state-of-the-art speech recognition systems are very different to those originally proposed thirty years ago. This talk will review two of the more important extensions that have been proposed over the years: discriminative training; and speaker and environment adaptation. The use of discriminative training is now common with forms based on minimum Bayes' training and minimum classification error being applied to systems trained on many hundreds of hours of speech data. The talk will describe these current approaches, as well as discussing the current trends towards schemes based on large-margin training approaches. Linear transform based speaker adaptation is the dominant form for speaker adaptation. Current approaches, including extensions to linear transforms and model-based noise robustness techniques, and trends will also be described. Details of the various forms of the adaptation/noise transformation, training criterion and approaches for adaptive training will be given. The final part of the talk will discuss research beyond the current HMM framework. Schemes based on both discriminative models and functions, as well as non-parametric approaches will be described.","PeriodicalId":89617,"journal":{"name":"Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"135 1","pages":"44"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86363365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Online discriminative learning: theory and applications 在线判别学习:理论与应用

Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373501

N. Cesa-Bianchi

引用次数: 0

It's not you, it's me: Automatically extracting social meaning from speed dates 不是你的问题，是我的问题:从速配约会中自动提取社交意义

Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373494

Dan Jurafsky

{"title":"It's not you, it's me: Automatically extracting social meaning from speed dates","authors":"Dan Jurafsky","doi":"10.1109/ASRU.2009.5373494","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373494","url":null,"abstract":"Automatically detecting human social intentions from spoken conversation is an important task for social computing and for dialogue systems. We describe a system for detecting elements of interactional style: whether a speaker is awkward, friendly, or flirtatious. We create and use a new spoken corpus of 991 4-minute speed-dates. Participants rated themselves and each other for these elements of style. Using rich dialogue, lexical, and prosodic features, we are able to detect flirtatious, awkward, and friendly styles in noisy natural conversational data with above 70% accuracy, significantly outperforming not only the baseline but also, for flirtation, outperforming the human interlocutors. We find that features like pitch, energy, and the use of emotional vocabulary help detect flirtation, collaborative conversational style (laughter, questions, collaborative completions) help in detecting friendliness, and disfluencies help in detecting awkwardness. In analyzing why our system outperforms humans, we show that humans are very poor perceivers of flirtatiousness in this task, and instead often project their own intended behavior onto their interlocutors. This talk describes joint work with Dan McFarland (School of Education) and Rajesh Ranganath (Computer Science Department).","PeriodicalId":89617,"journal":{"name":"Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"1 1","pages":"11"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75609740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Supervised and Unsupervised Feature Selection for Inferring Social Nature of Telephone Conversations from Their Content. 有监督和无监督特征选择从电话交谈内容推断其社交性质。

Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding Pub Date : 2008-04-03 Epub Date: 2003-10-13 DOI: 10.1109/ICCV.2003.1238369

Anthony Stark, Izhak Shafran, Jeffrey Kaye

{"title":"Supervised and Unsupervised Feature Selection for Inferring Social Nature of Telephone Conversations from Their Content.","authors":"Anthony Stark, Izhak Shafran, Jeffrey Kaye","doi":"10.1109/ICCV.2003.1238369","DOIUrl":"10.1109/ICCV.2003.1238369","url":null,"abstract":"<p><p>The ability to reliably infer the nature of telephone conversations opens up a variety of applications, ranging from designing context-sensitive user interfaces on smartphones, to providing new tools for social psychologists and social scientists to study and understand social life of different subpopulations within different contexts. Using a unique corpus of everyday telephone conversations collected from eight residences over the duration of a year, we investigate the utility of popular features, extracted solely from the content, in classifying business-oriented calls from others. Through feature selection experiments, we find that the discrimination can be performed robustly for a majority of the calls using a small set of features. Remarkably, features learned from unsupervised methods, specifically latent Dirichlet allocation, perform almost as well as with as those from supervised methods. The unsupervised clusters learned in this task shows promise of finer grain inference of social nature of telephone conversations.</p>","PeriodicalId":89617,"journal":{"name":"Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"1 ","pages":"378-384"},"PeriodicalIF":0.0,"publicationDate":"2008-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3384521/pdf/nihms-349265.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30735213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0