{"title":"Markovian combination of language and prosodic models for better speech understanding and recognition","authors":"A. Stolcke, Elizabeth Shriberg","doi":"10.1109/ASRU.2001.1034615","DOIUrl":null,"url":null,"abstract":"Summary form only given. Traditionally, \"language\" models capture only the word sequences of a language. A crucial component of spoken language, however is its prosody, i.e., rhythmic and melodic properties. This paper summarizes recent work on integrated, computationally efficient modeling of word sequences and prosodic properties of speech, for a variety of speech recognition and understanding tasks, such as dialog act tagging, disfluency detection, and segmentation into sentences and topics. In each case it turns out that hidden Markov representations of the underlying structures and associated observations arise naturally, and allow existing speech recognizers to be combined with separately trained prosodic classifiers. The same HMM-based models can be used in two modes: to recover hidden structure (such as sentence boundaries), or to evaluate speech recognition hypotheses, thereby integrating prosody into the recognition process.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2001.1034615","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Summary form only given. Traditionally, "language" models capture only the word sequences of a language. A crucial component of spoken language, however is its prosody, i.e., rhythmic and melodic properties. This paper summarizes recent work on integrated, computationally efficient modeling of word sequences and prosodic properties of speech, for a variety of speech recognition and understanding tasks, such as dialog act tagging, disfluency detection, and segmentation into sentences and topics. In each case it turns out that hidden Markov representations of the underlying structures and associated observations arise naturally, and allow existing speech recognizers to be combined with separately trained prosodic classifiers. The same HMM-based models can be used in two modes: to recover hidden structure (such as sentence boundaries), or to evaluate speech recognition hypotheses, thereby integrating prosody into the recognition process.