{"title":"An extractive-summarization baseline for the automatic detection of noteworthy utterances in multi-party human-human dialog","authors":"S. Banerjee, Alexander I. Rudnicky","doi":"10.1109/SLT.2008.4777869","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777869","url":null,"abstract":"Our goal is to reduce meeting participants' note-taking effort by automatically identifying utterances whose contents meeting participants are likely to include in their notes. Though note-taking is different from meeting summarization, these two problems are related. In this paper we apply techniques developed in extractive meeting summarization research to the problem of identifying noteworthy utterances. We show that these algorithms achieve an f-measure of 0.14 over a 5-meeting sequence of related meetings. The precision - 0.15 - is triple that of the trivial baseline of simply labeling every utterance as noteworthy. We also introduce the concept of ldquoshow-worthyrdquo utterances - utterances that contain information that could conceivably result in a note. We show that such utterances can be recognized with an 81% accuracy (compared to 53% accuracy of a majority classifier). Further, if non-show-worthy utterances are filtered out, the precision of noteworthiness detection improves by 33% relative.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129273988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Better statistical estimation can benefit all phrases in phrase-based statistical machine translation","authors":"K. Sima'an, M. Mylonakis","doi":"10.1109/SLT.2008.4777884","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777884","url":null,"abstract":"The heuristic estimates of conditional phrase translation probabilities are based on frequency counts in a word-aligned parallel corpus. Earlier attempts at more principled estimation using Expectation-Maximization (EM) under perform this heuristic. This paper shows that a recently introduced novel estimator based on smoothing might provide a good alternative. When all phrase pairs are estimated (no length cut-off), this estimator slightly outperforms the heuristic estimator.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132189642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
César González Ferreras, Valentín Cardeñoso-Payo, E. Arnal
{"title":"Experiments in speech driven question answering","authors":"César González Ferreras, Valentín Cardeñoso-Payo, E. Arnal","doi":"10.1109/SLT.2008.4777846","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777846","url":null,"abstract":"In this paper we present a system that allows users to obtain the answer to a given spoken question expressed in natural language. A large vocabulary continuous speech recognizer is used to transcribe the spoken question into text. Then, a question answering engine is used to obtain the answer to the question. Some improvements over the baseline system were proposed in order to adapt the output of the speech recognizer to the question answering engine: capitalized output from the speech recognizer and a language model for questions. System performance was evaluated using a standard question answering test suite from CLEF. Results showed that the proposed approach outperforms the baseline system both in WER and in over-all system accuracy.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129708353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic labeling of contrastive word pairs from spontaneous spoken english","authors":"Leonardo Badino, R. Clark","doi":"10.1109/SLT.2008.4777850","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777850","url":null,"abstract":"This paper addresses the problem of automatically labeling contrast in spontaneous spoken speech, where contrast here is meant as a relation that ties two words that explicitly contrast with each other. Detection of contrast is certainly relevant in the analysis of discourse and information structure and also, because of the prosodic correlates of contrast, could play an important role in speech applications, such as text-to-speech synthesis, that need an accurate and discourse context related modeling of prosody. With this prospect we investigate the feasibility of automatic contrast labeling by training and evaluating on the Switchboard corpus a novel contrast tagger, based on support vector machines (SVM), that combines lexical features, syntactic dependencies and WordNet semantic relations.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129838587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neil Patel, Sheetal K. Agarwal, Nitendra Rajput, A. A. Nanavati, Paresh Dave, Tapan S. Parikh
{"title":"Experiences designing a voice interface for rural India","authors":"Neil Patel, Sheetal K. Agarwal, Nitendra Rajput, A. A. Nanavati, Paresh Dave, Tapan S. Parikh","doi":"10.1109/SLT.2008.4777830","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777830","url":null,"abstract":"In this paper we describe our experiences designing a voice interface in rural India. We outline our design process from initial contextual inquiry to a formal user evaluation, and use this discussion to motivate research guidelines for others designing voice interfaces in developing regions. Our three guidelines are to build around existing information systems, to iterate on the design through user testing, and to explore design alternatives through empirical analysis. We also share some practical lessons learned in designing, implementing, and evaluating information systems for developing regions in general.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"76 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130803749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Word-lattice based spoken-document indexing with standard text indexers","authors":"F. Seide, K. Thambiratnam, Roger Peng Yu","doi":"10.1109/SLT.2008.4777898","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777898","url":null,"abstract":"Indexing the spoken content of audio recordings requires automatic speech recognition, which is as of today not reliable. Unlike indexing text, we cannot reliably know from a speech recognizer whether a word is present at a given point in the audio; we can only obtain a probability for it. Correct use of these probabilities significantly improves spoken-document search accuracy. In this paper, we will first describe how to improve accuracy for \"web-search style\" (AND/phrase) queries into audio, by utilizing speech recognition alternates and word posterior probabilities based on word lattices. Then, we will present an end-to-end approach to doing so using standard text indexers, which by design cannot handle probabilities and unaligned alternates. We present a sequence of approximations that transform the numeric lattice-matching problem into a symbolic text-based one that can be implemented by a commercial full-text indexer. Experiments on a 170-hour lecture set show an accuracy improvement by 30-60% for phrase searches and by 130% for two-term AND queries, compared to indexing linear text.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123695432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A response generation in the Mongolian spoken language system for accessing to multimedia knowledge base","authors":"Munkhtuya Davaatsagaan, K. Paliwal","doi":"10.1109/SLT.2008.4777838","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777838","url":null,"abstract":"By using automatic speech recognition (ASR) and text to speech (TTS) systems, which have been available in Mongolian for last few years, this research set out to implement a new version of the Mongolian Virtual Education Environment (VEE) that has not included a speech interface. The spoken language system aims to provide a natural interface between trainees and the environment by using simple and natural dialogues to enable the user to access the multimedia knowledge base of the VEE. We have worked on the response generation part of the system. This paper describes a TTS system for the VEE for university courses held in Mongolian. A concatenative speech synthesizer for Mongolian is applied for the TTS in response generation. A Festvox framework for unit selection speech synthesis was used to build the Mongolian voice. We discuss aspects of the voice development process and the results of a perceptual test of the synthesized voice.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126601462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low-resource speech translation of Urdu to English using semi-supervised part-of-speech tagging and transliteration","authors":"A. Aminzadeh, Wade Shen","doi":"10.1109/SLT.2008.4777891","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777891","url":null,"abstract":"This paper describes the construction of ASR and MT systems for translation of speech from Urdu into English. As both Urdu pronunciation lexicons and Urdu-English bitexts are sparse, we employ several techniques that make use of semi-supervised annotation to improve ASR and MT training. Specifically, we describe 1) the construction of a semi-supervised HMM-based part-of-speech tagger that is used to train factored translation models and 2) the use of an HMM-based transliterator from which we derive a spelling-to-pronunciation model for Urdu used in ASR training. We describe experiments performed for both ASR and MT training in the context of the Urdu-to-English task of the NIST MT08 Evaluation and we compare methods making use of additional annotation with standard statistical MT and ASR baselines.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121571833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}