Yuh-Lin Chang, Wenjun Zeng, I. Kamel, Rafael Alonso
{"title":"Integrated image and speech analysis for content-based video indexing","authors":"Yuh-Lin Chang, Wenjun Zeng, I. Kamel, Rafael Alonso","doi":"10.1109/MMCS.1996.534992","DOIUrl":null,"url":null,"abstract":"We study an important problem in multimedia database, namely the automatic extraction of indexing information from raw data based on video contents. The goal of our research project is to develop a prototype system for automatic indexing of sports videos. The novelty of our work is that we propose to integrate speech understanding and image analysis algorithms for extracting information. The main thrust of this work comes from the observation that in news or sports video indexing, usually speech analysis is more efficient in detecting events than image analysis. Therefore, in our system, the audio processing modules are first applied to locate candidates in the whole data. This information is passed to the video processing modules, which further analyze the video. The final products of video analysis are in the form of pointers to the locations of interesting events in a video. Our algorithms have been tested extensively with real TV programs, and results are presented and discussed.","PeriodicalId":371043,"journal":{"name":"Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1996-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"101","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MMCS.1996.534992","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 101
Abstract
We study an important problem in multimedia database, namely the automatic extraction of indexing information from raw data based on video contents. The goal of our research project is to develop a prototype system for automatic indexing of sports videos. The novelty of our work is that we propose to integrate speech understanding and image analysis algorithms for extracting information. The main thrust of this work comes from the observation that in news or sports video indexing, usually speech analysis is more efficient in detecting events than image analysis. Therefore, in our system, the audio processing modules are first applied to locate candidates in the whole data. This information is passed to the video processing modules, which further analyze the video. The final products of video analysis are in the form of pointers to the locations of interesting events in a video. Our algorithms have been tested extensively with real TV programs, and results are presented and discussed.