Shasha Xie, Dilek Z. Hakkani-Tür, Benoit Favre, Yang Liu
{"title":"Integrating prosodic features in extractive meeting summarization","authors":"Shasha Xie, Dilek Z. Hakkani-Tür, Benoit Favre, Yang Liu","doi":"10.1109/ASRU.2009.5373302","DOIUrl":null,"url":null,"abstract":"Speech contains additional information than text that can be valuable for automatic speech summarization. In this paper, we evaluate how to effectively use acoustic/prosodic features for extractive meeting summarization, and how to integrate prosodic features with lexical and structural information for further improvement. To properly represent prosodic features, we propose different normalization methods based on speaker, topic, or local context information. Our experimental results show that using only the prosodic features we achieve better performance than using the non-prosodic information on both the human transcripts and recognition output. In addition, a decision-level combination of the prosodic and non-prosodic features yields further gain, outperforming the individual models.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"56","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2009.5373302","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 56
Abstract
Speech contains additional information than text that can be valuable for automatic speech summarization. In this paper, we evaluate how to effectively use acoustic/prosodic features for extractive meeting summarization, and how to integrate prosodic features with lexical and structural information for further improvement. To properly represent prosodic features, we propose different normalization methods based on speaker, topic, or local context information. Our experimental results show that using only the prosodic features we achieve better performance than using the non-prosodic information on both the human transcripts and recognition output. In addition, a decision-level combination of the prosodic and non-prosodic features yields further gain, outperforming the individual models.