2012 IEEE Spoken Language Technology Workshop (SLT)最新文献_第8页

Topic n-gram count language model adaptation for speech recognition 主题n-图计数语言模型自适应语音识别

2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424216

Md. Akmal Haidar, D. O'Shaughnessy

{"title":"Topic n-gram count language model adaptation for speech recognition","authors":"Md. Akmal Haidar, D. O'Shaughnessy","doi":"10.1109/SLT.2012.6424216","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424216","url":null,"abstract":"We introduce novel language model (LM) adaptation approaches using the latent Dirichlet allocation (LDA) model. Observed n-grams in the training set are assigned to topics using soft and hard clustering. In soft clustering, each n-gram is assigned to topics such that the total count of that n-gram for all topics is equal to the global count of that n-gram in the training set. Here, the normalized topic weights of the n-gram are multiplied by the global n-gram count to form the topic n-gram count for the respective topics. In hard clustering, each n-gram is assigned to a single topic with the maximum fraction of the global n-gram count for the corresponding topic. Here, the topic is selected using the maximum topic weight for the n-gram. The topic n-gram count LMs are created using the respective topic n-gram counts and adapted by using the topic weights of a development test set. We compute the average of the confidence measures: the probability of word given topic and the probability of topic given word. The average is taken over the words in the n-grams and the development test set to form the topic weights of the n-grams and the development test set respectively. Our approaches show better performance over some traditional approaches using the WSJ corpus.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130771343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Statistical methods for varying the degree of articulation in new HMM-based voices 在新的基于hmm的声音中改变发音程度的统计方法

2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424238

B. Picart, Thomas Drugman, T. Dutoit

引用次数: 1

Automatic classification of unequal lexical stress patterns using machine learning algorithms 使用机器学习算法的不相等词法重音模式自动分类

2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424255

M. Shahin, B. Ahmed, K. Ballard

{"title":"Automatic classification of unequal lexical stress patterns using machine learning algorithms","authors":"M. Shahin, B. Ahmed, K. Ballard","doi":"10.1109/SLT.2012.6424255","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424255","url":null,"abstract":"Technology based speech therapy systems are severely handicapped due to the absence of accurate prosodic event identification algorithms. This paper introduces an automatic method for the classification of strong-weak (SW) and weak-strong (WS) stress patterns in children speech with American English accent, for use in the assessment of the speech dysprosody. We investigate the ability of two sets of features used to train classifiers to identify the variation in lexical stress between two consecutive syllables. The first set consists of traditional features derived from measurements of pitch, intensity and duration, whereas the second set consists of energies of different filter banks. Three different classifiers were used in the experiments: an Artificial Neural Network (ANN) classifier with a single hidden layer, Support Vector Machine (SVM) classifier with both linear and Gaussian kernels and the Maximum Entropy modeling (MaxEnt). these features. Best results were obtained using an ANN classifier and a combination of the two sets of features. The system correctly classified 94% of the SW stress patterns and 76% of the WS stress patterns.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132354438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Combining multiple translation systems for Spoken Language Understanding portability 结合多种翻译系统，实现口语理解的可移植性

2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424221

Fernando García, L. Hurtado, E. Segarra, E. Arnal, G. Riccardi

{"title":"Combining multiple translation systems for Spoken Language Understanding portability","authors":"Fernando García, L. Hurtado, E. Segarra, E. Arnal, G. Riccardi","doi":"10.1109/SLT.2012.6424221","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424221","url":null,"abstract":"We are interested in the problem of learning Spoken Language Understanding (SLU) models for multiple target languages. Learning such models requires annotated corpora, and porting to different languages would require corpora with parallel text translation and semantic annotations. In this paper we investigate how to learn a SLU model in a target language starting from no target text and no semantic annotation. Our proposed algorithm is based on the idea of exploiting the diversity (with regard to performance and coverage) of multiple translation systems to transfer statistically stable word-to-concept mappings in the case of the romance language pair, French and Spanish. Each translation system performs differently at the lexical level (wrt BLEU). The best translation system performances for the semantic task are gained from their combination at different stages of the portability methodology. We have evaluated the portability algorithms on the French MEDIA corpus, using French as the source language and Spanish as the target language. The experiments show the effectiveness of the proposed methods with respect to the source language SLU baseline.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114692441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Joint language models for automatic speech recognition and understanding 用于自动语音识别和理解的联合语言模型

2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424222

Ali Orkan Bayer, G. Riccardi

引用次数: 10

Syllable-based prosodic analysis of Amharic read speech 基于音节的阿姆哈拉语朗读韵律分析

2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424232

O. Jokisch, Y. Gebremedhin, R. Hoffmann

引用次数: 2

Audio-visual feature integration based on piecewise linear transformation for noise robust automatic speech recognition 基于分段线性变换的视听特征集成噪声鲁棒自动语音识别

2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424213

Yosuke Kashiwagi, Masayuki Suzuki, N. Minematsu, K. Hirose

{"title":"Audio-visual feature integration based on piecewise linear transformation for noise robust automatic speech recognition","authors":"Yosuke Kashiwagi, Masayuki Suzuki, N. Minematsu, K. Hirose","doi":"10.1109/SLT.2012.6424213","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424213","url":null,"abstract":"Multimodal speech recognition is a promising approach to realize noise robust automatic speech recognition (ASR), and is currently gathering the attention of many researchers. Multimodal ASR utilizes not only audio features, which are sensitive to background noises, but also non-audio features such as lip shapes to achieve noise robustness. Although various methods have been proposed to integrate audio-visual features, there are still continuing discussions on how the vest integration of audio and visual features is realized. Weights of audio and visual features should be decided according to the noise features and levels: in general, larger weights to visual features when the noise level is low and vice versa, but how it can be controlled? In this paper, we propose a method based on piecewise linear transformation in feature integration. In contrast to other feature integration methods, our proposed method can appropriately change the weight depending on a state of an observed noisy feature, which has information both on uttered phonemes and environmental noise. Experiments on noisy speech recognition are conducted following to CENSREC-1-AV, and word error reduction rate around 24% is realized in average as compared to a decision fusion method.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116622854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

The FAU Video Lecture Browser system FAU视频讲座浏览器系统

2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424256

K. Riedhammer, Martin Gropp, E. Nöth

引用次数: 7

What makes this voice sound so bad? A multidimensional analysis of state-of-the-art text-to-speech systems 这声音怎么这么难听?最先进的文本到语音系统的多维分析

2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424229

Florian Hinterleitner, C. Norrenbrock, S. Möller, U. Heute

引用次数: 14

Optimization of the DET curve in speaker verification 说话人验证中DET曲线的优化

2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424243

Leibny Paola García-Perera, J. Nolazco-Flores, B. Raj, R. Stern

{"title":"Optimization of the DET curve in speaker verification","authors":"Leibny Paola García-Perera, J. Nolazco-Flores, B. Raj, R. Stern","doi":"10.1109/SLT.2012.6424243","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424243","url":null,"abstract":"Speaker verification systems are, in essence, statistical pattern detectors which can trade off false rejections for false acceptances. Any operating point characterized by a specific tradeoff between false rejections and false acceptances may be chosen. Training paradigms in speaker verification systems however either learn the parameters of the classifier employed without actually considering this tradeoff, or optimize the parameters for a particular operating point exemplified by the ratio of positive and negative training instances supplied. In this paper we investigate the optimization of training paradigms to explicitly consider the tradeoff between false rejections and false acceptances, by minimizing the area under the curve of the detection error tradeoff curve. To optimize the parameters, we explicitly minimize a mathematical characterization of the area under the detection error tradeoff curve, through generalized probabilistic descent. Experiments on the NIST 2008 database show that for clean signals the proposed optimization approach is at least as effective as conventional learning. On noisy data, verification performance obtained with the proposed approach is considerably better than that obtained with conventional learning methods.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127161673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11