{"title":"语音文本分析自动搜索","authors":"A. Coden","doi":"10.1109/HICSS.2001.926473","DOIUrl":null,"url":null,"abstract":"We address the problem of finding collateral information pertinent to a live television broadcast in real time. The solution starts with a text transcript of the broadcast generated by an automatic speech recognition system. Speaker independent speech recognition technology, even when tailored for a broadcast scenario, generally produces transcripts with relatively low accuracy. Given this limitation, we have developed algorithms that can determine the essence of the broadcast from these transcripts. Specifically, we extract named entities, topics, and sentence types from the transcript and use them to automatically generate both structured and unstructured search queries. A novel distance-ranking algorithm is used to select relevant information from the search results. The whole process is performed online and the query results (i.e., the collateral information) are added to the broadcast stream.","PeriodicalId":201648,"journal":{"name":"Proceedings of the 34th Annual Hawaii International Conference on System Sciences","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Speech transcript analysis for automatic search\",\"authors\":\"A. Coden\",\"doi\":\"10.1109/HICSS.2001.926473\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We address the problem of finding collateral information pertinent to a live television broadcast in real time. The solution starts with a text transcript of the broadcast generated by an automatic speech recognition system. Speaker independent speech recognition technology, even when tailored for a broadcast scenario, generally produces transcripts with relatively low accuracy. Given this limitation, we have developed algorithms that can determine the essence of the broadcast from these transcripts. Specifically, we extract named entities, topics, and sentence types from the transcript and use them to automatically generate both structured and unstructured search queries. A novel distance-ranking algorithm is used to select relevant information from the search results. The whole process is performed online and the query results (i.e., the collateral information) are added to the broadcast stream.\",\"PeriodicalId\":201648,\"journal\":{\"name\":\"Proceedings of the 34th Annual Hawaii International Conference on System Sciences\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-01-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 34th Annual Hawaii International Conference on System Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HICSS.2001.926473\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 34th Annual Hawaii International Conference on System Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HICSS.2001.926473","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
We address the problem of finding collateral information pertinent to a live television broadcast in real time. The solution starts with a text transcript of the broadcast generated by an automatic speech recognition system. Speaker independent speech recognition technology, even when tailored for a broadcast scenario, generally produces transcripts with relatively low accuracy. Given this limitation, we have developed algorithms that can determine the essence of the broadcast from these transcripts. Specifically, we extract named entities, topics, and sentence types from the transcript and use them to automatically generate both structured and unstructured search queries. A novel distance-ranking algorithm is used to select relevant information from the search results. The whole process is performed online and the query results (i.e., the collateral information) are added to the broadcast stream.