{"title":"融合音视频信息的电视新闻节目语音检索","authors":"Xinbo Gao, Jie Li, H. Ji","doi":"10.1109/ICOSP.2002.1179955","DOIUrl":null,"url":null,"abstract":"A typical news story contains a brief report by the anchor person(s) in the studio, as well as news footage in the field. Investigation shows that our recognizer performs better when indexing audio from the studio than that from the field. In order to automatically extract the \"reliable\" audio segments for speech retrieval, we attempt to detect studio-to-field transitions by means of video parsing. Our research is based on 146 news stories collected from Hong Kong TVB Jade station. Retrieval using the entire audio track gave (average inverse rank) AIR=0.759 while, with the incorporation of video parsing, we performed retrieval based only on the studio recordings, which produced AIR=0.765.","PeriodicalId":159807,"journal":{"name":"6th International Conference on Signal Processing, 2002.","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Speech retrieval for TV news programs by fusing the audio and video information\",\"authors\":\"Xinbo Gao, Jie Li, H. Ji\",\"doi\":\"10.1109/ICOSP.2002.1179955\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A typical news story contains a brief report by the anchor person(s) in the studio, as well as news footage in the field. Investigation shows that our recognizer performs better when indexing audio from the studio than that from the field. In order to automatically extract the \\\"reliable\\\" audio segments for speech retrieval, we attempt to detect studio-to-field transitions by means of video parsing. Our research is based on 146 news stories collected from Hong Kong TVB Jade station. Retrieval using the entire audio track gave (average inverse rank) AIR=0.759 while, with the incorporation of video parsing, we performed retrieval based only on the studio recordings, which produced AIR=0.765.\",\"PeriodicalId\":159807,\"journal\":{\"name\":\"6th International Conference on Signal Processing, 2002.\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"6th International Conference on Signal Processing, 2002.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICOSP.2002.1179955\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"6th International Conference on Signal Processing, 2002.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOSP.2002.1179955","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Speech retrieval for TV news programs by fusing the audio and video information
A typical news story contains a brief report by the anchor person(s) in the studio, as well as news footage in the field. Investigation shows that our recognizer performs better when indexing audio from the studio than that from the field. In order to automatically extract the "reliable" audio segments for speech retrieval, we attempt to detect studio-to-field transitions by means of video parsing. Our research is based on 146 news stories collected from Hong Kong TVB Jade station. Retrieval using the entire audio track gave (average inverse rank) AIR=0.759 while, with the incorporation of video parsing, we performed retrieval based only on the studio recordings, which produced AIR=0.765.