{"title":"Speech transcript analysis for automatic search","authors":"A. Coden","doi":"10.1109/HICSS.2001.926473","DOIUrl":null,"url":null,"abstract":"We address the problem of finding collateral information pertinent to a live television broadcast in real time. The solution starts with a text transcript of the broadcast generated by an automatic speech recognition system. Speaker independent speech recognition technology, even when tailored for a broadcast scenario, generally produces transcripts with relatively low accuracy. Given this limitation, we have developed algorithms that can determine the essence of the broadcast from these transcripts. Specifically, we extract named entities, topics, and sentence types from the transcript and use them to automatically generate both structured and unstructured search queries. A novel distance-ranking algorithm is used to select relevant information from the search results. The whole process is performed online and the query results (i.e., the collateral information) are added to the broadcast stream.","PeriodicalId":201648,"journal":{"name":"Proceedings of the 34th Annual Hawaii International Conference on System Sciences","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 34th Annual Hawaii International Conference on System Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HICSS.2001.926473","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18
Abstract
We address the problem of finding collateral information pertinent to a live television broadcast in real time. The solution starts with a text transcript of the broadcast generated by an automatic speech recognition system. Speaker independent speech recognition technology, even when tailored for a broadcast scenario, generally produces transcripts with relatively low accuracy. Given this limitation, we have developed algorithms that can determine the essence of the broadcast from these transcripts. Specifically, we extract named entities, topics, and sentence types from the transcript and use them to automatically generate both structured and unstructured search queries. A novel distance-ranking algorithm is used to select relevant information from the search results. The whole process is performed online and the query results (i.e., the collateral information) are added to the broadcast stream.