2009 IEEE Workshop on Automatic Speech Recognition & Understanding最新文献_第7页

New perspectives on spoken language understanding: Does machine need to fully understand speech? 口语理解的新视角:机器需要完全理解语音吗?

2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373502

Tatsuya Kawahara

引用次数: 15

A hierarchical structure for modeling inter and intra phonetic information for phoneme recognition 一种用于音位识别的语音间和语音内信息建模的层次结构

2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373272

D. Vásquez, Guillermo Aradilla, R. Gruhn, W. Minker

引用次数: 2

Topic-based speaker recognition for German parliamentary speeches 基于主题的德国议会发言识别

2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5372907

Doris Baum

{"title":"Topic-based speaker recognition for German parliamentary speeches","authors":"Doris Baum","doi":"10.1109/ASRU.2009.5372907","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5372907","url":null,"abstract":"In the last decade, high-level features for speaker recognition have become a research focus, as they are believed to alleviate the weak point of the classical spectral/cepstral-feature-based approaches: mismatch in acoustic conditions or channel between training and test data. Identification cues such as prosody, pronunciation, and idiolect have been successfully investigated. Semantic speaker recognition, such as identifying people by the topics they frequently talk about, has not found an equal amount of attention. However, it is a promising approach, especially for broadcast data and multimedia archives, where prominent speakers can be expected to often talk about their specific subjects. This paper reports on our experiments with topic-based speaker recognition on German parliamentary speeches. Text transcripts of speeches of federal ministers were used to train speaker models based on word frequencies. For recognition, these models were applied to automatic speech recognition transcripts of parliamentary speeches and could identify the correct speaker surprisingly well, with an EER of 13.8%. Fusing this approach with a classical GMM-UBM system (with EER 14.3%) yields an improved EER of 8.6%.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122101260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Extractive speech summarization by active learning 主动学习提取语音摘要

2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373269

J. Zhang, R. Chan, Pascale Fung

引用次数: 7

Representing the Reinforcement Learning state in a negotiation dialogue 在协商对话中表示强化学习状态

2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373413

P. Heeman

引用次数: 27

Voice-based information retrieval — how far are we from the text-based information retrieval ? 基于语音的信息检索——我们离基于文本的信息检索还有多远?

2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5372952

Lin-Shan Lee, Yi-Cheng Pan

引用次数: 19

On speeding phoneme recognition in a hierarchical MLP structure 层次MLP结构中加速音素识别的研究

2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373278

D. Vásquez, Guillermo Aradilla, R. Gruhn, W. Minker

引用次数: 0

Pronunciation modeling for dialectal arabic speech recognition 阿拉伯方言语音识别的发音建模

2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373245

Hassan Al-Haj, Roger Hsiao, Ian Lane, A. Black, A. Waibel

引用次数: 22

Sub-structure-based estimation of pronunciation proficiency and classification of learners 基于子结构的语音熟练度评估与学习者分类

2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373275

Masayuki Suzuki, N. Minematsu, Dean Luo, K. Hirose

{"title":"Sub-structure-based estimation of pronunciation proficiency and classification of learners","authors":"Masayuki Suzuki, N. Minematsu, Dean Luo, K. Hirose","doi":"10.1109/ASRU.2009.5373275","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373275","url":null,"abstract":"Automatic estimation of pronunciation proficiency has its specific difficulty. Adequacy in controlling the vocal organs can be estimated from spectral envelopes of input utterances but the envelope patterns are also affected easily by different speakers. To develop a pedagogically sound method for automatic estimation, the envelope changes caused by linguistic factors and those by extra-linguistic factors should be properly separated. For this aim, in our previous study [1], we proposed a mathematically-guaranteed and linguistically-valid speaker-invariant representation of pronunciation, called speech structure. After the proposal, we have examined that representation also for ASR [2], [3], [4] and, through these works, we have learned better how to apply speech structures to various tasks. In this paper, we focus on a proficiency estimation experiment done in [1] and, based on our recently proposed techniques for the structures, we carry out that experiment again but under new and different conditions. Here, we use smaller units of structural analysis, speaker-invariant substructures, and relative structural distances between a learner and a teacher. Results show that correlations between human and machine rating are improved and also show extremely higher robustness to speaker differences compared to widely used GOP scores. Further, we also demonstrate that the proposed representation can classify learners purely based on their pronunciation proficiency, not affected by their age and gender.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128725998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

An improved parallel model combination method for noisy speech recognition 一种改进的并行模型组合方法用于噪声语音识别

2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI: 10.1109/ASRU.2009.5373332

H. Veisi, H. Sameti

引用次数: 2