C. Popovici, M. Andorno, P. Laface, L. Fissore, M. Nigra, C. Vair
{"title":"Directory assistance: learning user formulations for business listings","authors":"C. Popovici, M. Andorno, P. Laface, L. Fissore, M. Nigra, C. Vair","doi":"10.1109/ASRU.2001.1034634","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034634","url":null,"abstract":"One of the main problems in automatic directory assistance (DA) for business listings is that customers formulate their requests for the same listing with a great variability. We show that an automatic approach allows the detection, from field data, of user formulations that were not foreseen by the designers, and that they can be added, as variants, to the denominations already included in the system to reduce its failures.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116802537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating long-term spectral subtraction for reverberant ASR","authors":"David Gelbart, Nelson Morgan","doi":"10.1109/ASRU.2001.1034598","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034598","url":null,"abstract":"Even a modest degree of room reverberation can greatly increase the difficulty of automatic speech recognition. We have observed large increases in speech recognition word error rates when using a far-field (3-6 feet) microphone in a conference room, in comparison with recordings from head-mounted microphones. In this paper, we describe experiments with a proposed remedy based on the subtraction of an estimate of the log spectrum from a long-term (e.g., 2 s) analysis window, followed by overlap-add resynthesis. Since the technique is essentially one of enhancement, the processed signal it generates can be used as input for complete speech recognition systems. Here we report results with both the HTK and the SRI Hub-5 recognizer. For simpler recognizer configurations and/or moderate-sized training, the improvements are huge, while moderate improvements are still observed for more complex configurations under a number of conditions.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117213225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"n-gram and decision tree based language identification for written words","authors":"J. Hakkinen, Jilei Tian","doi":"10.1109/ASRU.2001.1034655","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034655","url":null,"abstract":"As the demand for multilingual speech recognizers increases, the development of systems which combine automatic language identification, language-specific pronunciation modeling and language-independent acoustic models becomes increasingly important. When the recognition grammar is dynamic and obtained directly from written text, the language associated with each grammar item has to be identified using that text. Many methods proposed in the literature require fairly large amounts of text, which may not always be available. This paper describes a text-based language identification system developed for the identification of the language of short words, e.g., proper names. Two different approaches are compared. The n-gram method commonly used in the literature is first reviewed and further enhanced. We also propose a simple method for language identification that is based on decision trees. The methods are first evaluated in a text-based language identification task. Both methods are also tested as preprocessors for a multilingual speech recognition task, where the language of each text item has to be determined, in order to choose the correct text-to-pronunciation mapping. The experimental results show that the proposed methods perform very well, and merit further development.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124946275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The ALERT system: advanced broadcast speech recognition technology for selective dissemination of multimedia information","authors":"G. Rigoll","doi":"10.1109/ASRU.2001.1034647","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034647","url":null,"abstract":"This paper presents a brief description of the ALERT system, which is under development by a consortium working on a research project sponsored by the European Commission. The ALERT system uses advanced speech recognition technology and video processing techniques in order to process large broadcast speech archives and multimedia information resources for the purpose of extracting specific information from such databases and inform selected customers about its contents. It is one of the most ambitious projects currently carried out in the human language technologies (HLT) area (see also http://alert.uni-duisburg.de). The paper describes the objectives of the overall system, its basic system architecture and the scientific approach taken in order to realize the specified demonstrators.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114565900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CORBA-based speech-to-speech translation system","authors":"R. Gruhn, K. Takashima, A. Nishino, S. Nakamura","doi":"10.1109/ASRU.2001.1034660","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034660","url":null,"abstract":"We describe the new implementation of a speech-to-speech translation system at ATR Spoken Language Translation Research Laboratories (SLT). We use the architecture standard CORBA (Common Object Request Broker Architecture) to interface between a speech recognizer, translation system and TTS engine. Various input types are supported, including close-talking microphone and telephony hardware.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129177184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shang-Ming Lee, Shih-Hau Fang, J. Hung, Lin-Shan Lee
{"title":"Improved MFCC feature extraction by PCA-optimized filter-bank for speech recognition","authors":"Shang-Ming Lee, Shih-Hau Fang, J. Hung, Lin-Shan Lee","doi":"10.1109/ASRU.2001.1034586","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034586","url":null,"abstract":"Although Mel-frequency cepstral coefficients (MFCC) have been proven to perform very well under most conditions, some limited efforts have been made in optimizing the shape of the filters in the filter-bank in the conventional MFCC approach. This paper presents a new feature extraction approach that designs the shapes of the filters in the filter-bank. In this new approach, the filter-bank coefficients are data-driven and obtained by applying principal component analysis (PCA) to the FFT spectrum of the training data. The experimental results show that this method is robust under noisy environment and is well additive with other noise-handling techniques.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123726179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Noth, A. Batliner, H. Niemann, G. Stemmer, F. Gallwitz, J. Spilker
{"title":"Language models beyond word strings","authors":"E. Noth, A. Batliner, H. Niemann, G. Stemmer, F. Gallwitz, J. Spilker","doi":"10.1109/ASRU.2001.1034614","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034614","url":null,"abstract":"In this paper we want to show how n-gram language models can be used to provide additional information in automatic speech understanding systems beyond the pure word chain. This becomes important in the context of conversational dialogue systems that have to recognize and interpret spontaneous speech. We show how n-grams can: (1) help to classify prosodic events like boundaries and accents; (2) be extended to directly provide boundary information in the speech recognition phase; (3) help to process speech repairs; and (4) detect and semantically classify out-of-vocabulary words. The approaches can work on the best word chain or a word hypotheses graph. Examples and experimental results are provided from our own research within the EVAR information retrieval system and the VERBMOBIL speech-to-speech translation system.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116483795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dialogue management in the Talk'n'Travel system","authors":"D. Stallard","doi":"10.1109/ASRU.2001.1034631","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034631","url":null,"abstract":"A central problem for mixed-initiative dialogue management is coping with user utterances that fall outside of the expected sequence of dialogue. Independent initiative by the user may require a complete revision of the future course of the dialogue, even when the system is engaged in activities of its own, such as querying a database, etc. This paper presents an event-driven, goal-based dialogue manager component we have developed to cope with these challenges. The dialog manager is explicitly designed for asynchronous input and flexible control, and uses a tree-ordered rule language we have developed that also provides for close coupling with discourse processing. The dialogue manager is implemented as part of Talk'n'Travel, a simulated air travel reservation dialogue system we have developed under the US DARPA Communicator dialogue research program, whose purpose and scope we also briefly summarize.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114688771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The statistical approach to spoken language translation","authors":"H. Ney","doi":"10.1109/ASRU.2001.1034663","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034663","url":null,"abstract":"This paper gives an overview of our work on statistical machine translation of spoken dialogues, in particular in the framework of the VERBMOBIL project. The goal of the VERBMOBIL project is the translation of spoken dialogues in the domains of appointment scheduling and travel planning. Starting with the Bayes decision rule as in speech recognition; we show how the required probability distributions can be structured into three parts: the language model, the alignment model and the lexicon model. We describe the components of the system and report results on the VERBMOBIL task. The experience obtained in the VERBMOBIL project, in particular a largescale end-to-end evaluation, showed that the statistical approach resulted in significantly lower error rates than three competing translation approaches: the sentence error rate was 29% in comparison with 52% to 62% for the other translation approaches. Finally, we discuss the integrated approach to speech translation as opposed to the serial approach that is widely used nowadays.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126359279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computing consensus translation from multiple machine translation systems","authors":"B. Bangalore, Germán Bordel, G. Riccardi","doi":"10.1109/ASRU.2001.1034659","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034659","url":null,"abstract":"We address the problem of computing a consensus translation given the outputs from a set of machine translation (MT) systems. The translations from the MT systems are aligned with a multiple string alignment algorithm and the consensus translation is then computed. We describe the multiple string alignment algorithm and the consensus MT hypothesis computation. We report on the subjective and objective performance of the multilingual acquisition approach on a limited domain spoken language application. We evaluate five domain-independent off-the-shelf MT systems and show that the consensus-based translation performance is equal to or better than any of the given MT systems, in terms of both objective and subjective measures.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126434804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}