Benoit Favre, Dilek Z. Hakkani-Tür, Slav Petrov, D. Klein
{"title":"Efficient sentence segmentation using syntactic features","authors":"Benoit Favre, Dilek Z. Hakkani-Tür, Slav Petrov, D. Klein","doi":"10.1109/SLT.2008.4777844","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777844","url":null,"abstract":"To enable downstream language processing,automatic speech recognition output must be segmented into its individual sentences. Previous sentence segmentation systems have typically been very local,using low-level prosodic and lexical features to independently decide whether or not to segment at each word boundary position. In this work,we leverage global syntactic information from a syntactic parser, which is better able to capture long distance dependencies. While some previous work has included syntactic features, ours is the first to do so in a tractable, lattice-based way, which is crucial for scaling up to long-sentence contexts. Specifically, an initial hypothesis lattice is constructed using local features. Candidate sentences are then assigned syntactic language model scores. These global syntactic scores are combined with local low-level scores in a log-linear model. The resulting system significantly outperforms the most popular long-span model for sentence segmentation (the hidden event language model) on both reference text and automatic speech recognizer output from news broadcasts.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128902948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using output probability distribution for oov word rejection","authors":"Shilei Huang, Xiang Xie, Pascale Fung","doi":"10.1109/SLT.2008.4777880","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777880","url":null,"abstract":"This paper proposes a method to calculate the confidence score for out-of-vocabulary (OOV) word verification based on the Output Probability Distribution (OPD) of phoneme HMMs. Compared with input vector for dynamic garbage model, OPD vector contains more information than the sorted probabilities. Confidence score of each phoneme is calculated by SVM with OPD vectors as input. Hypotheses are accepted or rejected based on this confidence score. Experimental results showed that the proposed method achieved lower EER in word verification task than the conventional dynamic garbage model.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115756977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. T. Mengistu, M. Hannemann, T. Baum, A. Wendemuth
{"title":"Hierarchical HMM-based semantic concept labeling model","authors":"K. T. Mengistu, M. Hannemann, T. Baum, A. Wendemuth","doi":"10.1109/SLT.2008.4777839","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777839","url":null,"abstract":"An utterance can be conceived as a hidden sequence of semantic concepts expressed in words or phrases. The problem of understanding the meaning underlying a spoken utterance in a dialog system can be partly solved by decoding the hidden sequence of semantic concepts from the observed sequence of words. In this paper, we describe a hierarchical HMM-based semantic concept labeling model trained on semantically unlabeled data. The hierarchical model is compared with a flat concept based model in terms of performance, ambiguity resolution ability and expressive power of the output. It is shown that the proposed method outperforms the flat-concept model in these points.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125952109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arindam Mandal, D. Vergyri, Wen Wang, Jing Zheng, A. Stolcke, Gökhan Tür, Dilek Z. Hakkani-Tür, N. F. Ayan
{"title":"Efficient data selection for machine translation","authors":"Arindam Mandal, D. Vergyri, Wen Wang, Jing Zheng, A. Stolcke, Gökhan Tür, Dilek Z. Hakkani-Tür, N. F. Ayan","doi":"10.1109/SLT.2008.4777890","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777890","url":null,"abstract":"Performance of statistical machine translation (SMT) systems relies on the availability of a large parallel corpus which is used to estimate translation probabilities. However, the generation of such corpus is a long and expensive process. In this paper, we introduce two methods for efficient selection of training data to be translated by humans. Our methods are motivated by active learning and aim to choose new data that adds maximal information to the currently available data pool. The first method uses a measure of disagreement between multiple SMT systems, whereas the second uses a perplexity criterion. We performed experiments on Chinese-English data in multiple domains and test sets. Our results show that we can select only one-fifth of the additional training data and achieve similar or better translation performance, compared to that of using all available data.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125088890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On-the-fly term spotting by phonetic filtering and request-driven decoding","authors":"Mickael Rouvier, G. Linarès, B. Lecouteux","doi":"10.1109/SLT.2008.4777901","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777901","url":null,"abstract":"This paper addresses the problem of on-the-fly term spotting in continuous speech streams. We propose a 2-level architecture in which recall and accuracy are sequentially optimized. The first level uses a cascade of phonetic filters to select the speech segments which probably contain the targeted terms. The second level performs a request-driven decoding of the selected speech segments. The results show good performance of the proposed system on broadcast news data : the best configuration reaches a F-measure of about 94% while respecting the on-the-fly processing constraint.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129897628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arianna Bisazza, Marco Dinarelli, S. Quarteroni, Sara Tonelli, Alessandro Moschitti, G. Riccardi
{"title":"Semantic annotations for conversational speech: From speech transcriptions to predicate argument structures","authors":"Arianna Bisazza, Marco Dinarelli, S. Quarteroni, Sara Tonelli, Alessandro Moschitti, G. Riccardi","doi":"10.1109/SLT.2008.4777841","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777841","url":null,"abstract":"In this paper, we describe the semantic content, which can be automatically generated, for the design of advanced dialog systems. Since the latter will be based on machine learning approaches, we created training data by annotating a corpus with the needed content. Given a sentence of our transcribed corpus, domain concepts and other linguistic levels ranging from basic ones, i.e. part-of-speech tagging and constituent chunking level, to more advanced ones, i.e. syntactic and predicate argument structure (PAS) levels are annotated. In particular, the proposed PAS and taxonomy of dialog acts appear to be promising for the design of more complex dialog systems. Statistics about our semantic annotation are reported.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129558654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sebastian Spiegler, B. Golénia, Kseniya B. Shalonova, Peter A. Flach, R. Tucker
{"title":"Learning the morphology of Zulu with different degrees of supervision","authors":"Sebastian Spiegler, B. Golénia, Kseniya B. Shalonova, Peter A. Flach, R. Tucker","doi":"10.1109/SLT.2008.4777827","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777827","url":null,"abstract":"In this paper we compare different levels of supervision for learning the morphology of the indigenous South African language Zulu. After a preliminary analysis of the Zulu data used for our experiments, we concentrate on supervised, semi-supervised and unsupervised approaches comparing strengths and weaknesses of each method. The challenges we face are limited data availability and data sparsity in connection with morphological analysis of indigenous languages. At the end of the paper we draw conclusions for our future work towards a morphological analyzer for Zulu.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125228576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RSHMM++ for extractive lecture speech summarization","authors":"J. Zhang, Shilei Huang, Pascale Fung","doi":"10.1109/SLT.2008.4777865","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777865","url":null,"abstract":"We propose an enhanced Rhetorical-State Hidden Markov Model (RSHMM++) for extracting hierarchical structural summaries from lecture speech. One of the most underutilized information in extractive summarization is rhetorical structure hidden in speech data. RSHMM++ automatically decodes this underlying information in order to provide better summaries. We show that RSHMM++ gives a 72.01% ROUGE-L F-measure, a 9.78% absolute increase in lecture speech summarization performance compared to the baseline system without using rhetorical information. We also propose Relaxed DTW for compiling reference summaries.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133572161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Latent dirichlet language model for speech recognition","authors":"Jen-Tzung Chien, C. Chueh","doi":"10.1109/SLT.2008.4777875","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777875","url":null,"abstract":"Latent Dirichlet allocation (LDA) has been successfully presented for document modeling and classification. LDA calculates the document probability based on bag-of-words scheme without considering the sequence of words. This model discovers the topic structure at document level, which is different from the concern of word prediction in speech recognition. In this paper, we present a new latent Dirichlet language model (LDLM) for modeling of word sequence. A new Bayesian framework is introduced by merging the Dirichlet priors to characterize the uncertainty of latent topics of n-gram events. The robust topic-based language model is established accordingly. In the experiments, we implement LDLM for continuous speech recognition and obtain better performance than probabilistic latent semantic analysis (PLSA) based language method.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114462496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Contour modeling of prosodic and acoustic features for speaker recognition","authors":"M. Kockmann, L. Burget","doi":"10.1109/SLT.2008.4777836","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777836","url":null,"abstract":"In this paper we use acoustic and prosodic features jointly in a long-temporal lexical context for automatic speaker recognition from speech. The contours of pitch, energy and cepstral coefficients are continuously modeled over the time span of a syllable to capture the speaking style on phonetic level. As these features are affected by session variability, established channel compensation techniques are examined. Results for the combination of different features on a syllable-level as well as for channel compensation are presented for the NIST SRE 2006 speaker identification task. To show the complementary character of the features, the proposed system is fused with an acoustic short-time system, leading to a relative improvement of 10.4%.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116717717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}