2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)最新文献

筛选
英文 中文
The RWTH Arabic-to-English spoken language translation system RWTH阿拉伯语到英语口语翻译系统
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430145
Oliver Bender, E. Matusov, Stefan Hahn, Sasa Hasan, Shahram Khadivi, H. Ney
{"title":"The RWTH Arabic-to-English spoken language translation system","authors":"Oliver Bender, E. Matusov, Stefan Hahn, Sasa Hasan, Shahram Khadivi, H. Ney","doi":"10.1109/ASRU.2007.4430145","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430145","url":null,"abstract":"We present the RWTH phrase-based statistical machine translation system designed for the translation of Arabic speech into English text. This system was used in the Global Autonomous Language Exploitation (GALE) Go/No-Go Translation Evaluation 2007. Using a two-pass approach, we first generate n-best translation candidates and then rerank these candidates using additional models. We give a short review of the decoder as well as of the models used in both passes. We stress the difficulties of spoken language translation, i.e. how to combine the recognition and translation systems and how to compensate for missing punctuation. In addition, we cover our work on domain adaptation for the applied language models. We present translation results for the official GALE 2006 evaluation set and the GALE 2007 development set.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126409212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
A data-centric architecture for data-driven spoken dialog systems 用于数据驱动的口语对话系统的以数据为中心的架构
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430168
S. Varges, G. Riccardi
{"title":"A data-centric architecture for data-driven spoken dialog systems","authors":"S. Varges, G. Riccardi","doi":"10.1109/ASRU.2007.4430168","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430168","url":null,"abstract":"Data is becoming increasingly crucial for training and (self-) evaluation of spoken dialog systems (SDS). Data is used to train models (e.g. acoustic models) and is 'forgotten'. Data is generated on-line from the different components of the SDS system, e.g. the dialog manager, as well as from the world it is interacting with (e.g. news streams, ambient sensors etc.). The data is used to evaluate and analyze conversational systems both on-line and off-line. We need to be able query such heterogeneous data for further processing. In this paper we present an approach with two novel components: first, an architecture for SDSs that takes a data-centric view, ensuring persistency and consistency of data as it is generated. The architecture is centered around a database that stores dialog data beyond the lifetime of individual dialog sessions, facilitating dialog mining, annotation, and logging. Second, we take advantage of the state-fullness of the data-centric architecture by means of a lightweight, reactive and inference-based dialog manager that itself is stateless. The feasibility of our approach has been validated within a prototype of a phone-based university help-desk application. We detail SDS architecture and dialog management, model, and data representation.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121132213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Improvements in phone based audio search via constrained match with high order confusion estimates 基于高阶混淆估计约束匹配的电话音频搜索改进
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430191
U. Chaudhari, M. Picheny
{"title":"Improvements in phone based audio search via constrained match with high order confusion estimates","authors":"U. Chaudhari, M. Picheny","doi":"10.1109/ASRU.2007.4430191","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430191","url":null,"abstract":"This paper investigates an approximate similarity measure for searching in phone based audio transcripts. The baseline method combines elements found in the literature to form an approach based on a phonetic confusion matrix that is used to determine the similarity of an audio document and a query, both of which are parsed into phone N-grams. Experimental results show comparable performance to other approaches in the literature. Extensions of the approach are developed based on a constrained form of the similarity measure that can take into consideration the system dependent errors that can occur. This is done by accounting for higher order confusions, namely of phone bi-grams and tri-grams. Results show improved performance across a variety of system configurations.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121197364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Graph-based learning for phonetic classification 基于图的语音分类学习
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430138
Andrei Alexandrescu, K. Kirchhoff
{"title":"Graph-based learning for phonetic classification","authors":"Andrei Alexandrescu, K. Kirchhoff","doi":"10.1109/ASRU.2007.4430138","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430138","url":null,"abstract":"We introduce graph-based learning for acoustic-phonetic classification. In graph-based learning, training and test data points are jointly represented in a weighted undirected graph characterized by a weight matrix indicating similarities between different samples. Classification of test samples is achieved by label propagation over the entire graph. Although this learning technique is commonly applied in semi-supervised settings, we show how it can also be used as a postprocessing step to a supervised classifier by imposing additional regularization constraints based on the underlying data manifold. We also present a technique to adapt graph-based learning to large datasets and evaluate our system on a vowel classification task. Our results show that graph-based learning improves significantly over state-of-the art baselines.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115265774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Data selection for speech recognition 语音识别的数据选择
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430173
Yi Wu, Rong Zhang, Alexander I. Rudnicky
{"title":"Data selection for speech recognition","authors":"Yi Wu, Rong Zhang, Alexander I. Rudnicky","doi":"10.1109/ASRU.2007.4430173","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430173","url":null,"abstract":"This paper presents a strategy for efficiently selecting informative data from large corpora of transcribed speech. We propose to choose data uniformly according to the distribution of some target speech unit (phoneme, word, character, etc). In our experiment, in contrast to the common belief that \"there is no data like more data\", we found it possible to select a highly informative subset of data that produces recognition performance comparable to a system that makes use of a much larger amount of data. At the same time, our selection process is efficient and fast.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122551977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 76
Robust speaker clustering strategies to data source variation for improved speaker diarization 基于数据源变化的稳健说话人聚类策略改进说话人划分
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430121
Kyu Jeong Han, Samuel Kim, Shrikanth S. Narayanan
{"title":"Robust speaker clustering strategies to data source variation for improved speaker diarization","authors":"Kyu Jeong Han, Samuel Kim, Shrikanth S. Narayanan","doi":"10.1109/ASRU.2007.4430121","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430121","url":null,"abstract":"Agglomerative hierarchical clustering (AHC) has been widely used in speaker diarization systems to classify speech segments in a given data source by speaker identity, but is known to be not robust to data source variation. In this paper, we identify one of the key potential sources of this variability that negatively affects clustering error rate (CER), namely short speech segments, and propose three solutions to tackle this issue. Through experiments on various meeting conversation excerpts, the proposed methods are shown to outperform simple AHC in terms of relative CER improvements in the range of 17-32%.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122630267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
OOV detection by joint word/phone lattice alignment 联合字/手机点阵对齐的OOV检测
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430159
Hui-Ching Lin, J. Bilmes, D. Vergyri, K. Kirchhoff
{"title":"OOV detection by joint word/phone lattice alignment","authors":"Hui-Ching Lin, J. Bilmes, D. Vergyri, K. Kirchhoff","doi":"10.1109/ASRU.2007.4430159","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430159","url":null,"abstract":"We propose a new method for detecting out-of-vocabulary (OOV) words for large vocabulary continuous speech recognition (LVCSR) systems. Our method is based on performing a joint alignment between independently generated word and phone lattices, where the word-lattice is aligned via a recognition lexicon. Based on a similarity measure between phones, we can locate highly mis-aligned regions of time, and then specify those regions as candidate OOVs. This novel approach is implemented using the framework of graphical models (GMs), which enable fast flexible integration of different scores from word lattices, phone lattices, and the similarity measures. We evaluate our method on switchboard data using RT-04 as test set. Experimental results show that our approach provides a promising and scalable new way to detect OOV for LVCSR.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122846032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 65
Joint decoding of multiple speech patterns for robust speech recognition 多语音模式联合解码的鲁棒语音识别
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430090
N.U. Nair, T. Sreenivas
{"title":"Joint decoding of multiple speech patterns for robust speech recognition","authors":"N.U. Nair, T. Sreenivas","doi":"10.1109/ASRU.2007.4430090","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430090","url":null,"abstract":"We are addressing a new problem of improving automatic speech recognition performance, given multiple utterances of patterns from the same class. We have formulated the problem of jointly decoding K multiple patterns given a single hidden Markov model. It is shown that such a solution is possible by aligning the K patterns using the proposed multi pattern dynamic time warping algorithm followed by the constrained multi pattern Viterbi algorithm. The new formulation is tested in the context of speaker independent isolated word recognition for both clean and noisy patterns. When 10 percent of speech is affected by a burst noise at -5 dB signal to noise ratio (local), it is shown that joint decoding using only two noisy patterns reduces the noisy speech recognition error rate to about 51 percent, when compared to the single pattern decoding using the Viterbi Algorithm. In contrast a simple maximization of individual pattern likelihoods, provides only about 7 percent reduction in error rate.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130104773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Broad phonetic class recognition in a Hidden Markov model framework using extended Baum-Welch transformations 使用扩展Baum-Welch变换的隐马尔可夫模型框架中的广义语音类识别
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430129
Tara N. Sainath, D. Kanevsky, B. Ramabhadran
{"title":"Broad phonetic class recognition in a Hidden Markov model framework using extended Baum-Welch transformations","authors":"Tara N. Sainath, D. Kanevsky, B. Ramabhadran","doi":"10.1109/ASRU.2007.4430129","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430129","url":null,"abstract":"In many pattern recognition tasks, given some input data and a model, a probabilistic likelihood score is often computed to measure how well the model describes the data. Extended Baum-Welch (EBW) transformations are most commonly used as a discriminative technique for estimating parameters of Gaussian mixtures, though recently they have been used to derive a gradient steepness measurement to evaluate the quality of the model to match the distribution of the data. In this paper, we explore applying the EBW gradient steepness metric in the context of Hidden Markov Models (HMMs) for recognition of broad phonetic classes and present a detailed analysis and results on the use of this gradient metric on the TIMIT corpus. We find that our gradient metric is able to outperform the baseline likelihood method, and offers improvements in noisy conditions.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125305817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Empirical study of neural network language models for Arabic speech recognition 神经网络语言模型在阿拉伯语语音识别中的实证研究
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430100
Ahmad Emami, L. Mangu
{"title":"Empirical study of neural network language models for Arabic speech recognition","authors":"Ahmad Emami, L. Mangu","doi":"10.1109/ASRU.2007.4430100","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430100","url":null,"abstract":"In this paper we investigate the use of neural network language models for Arabic speech recognition. By using a distributed representation of words, the neural network model allows for more robust generalization and is better able to fight the data sparseness problem. We investigate different configurations of the neural probabilistic model, experimenting with such parameters as N-gram order, output vocabulary, normalization method, and model size and parameters. Experiments were carried out on Arabic broadcast news and broadcast conversations data and the optimized neural network language models showed significant improvements over the baseline N-gram model.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128802748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信