{"title":"基于音视频内容分析的场景变化检测","authors":"Yingying Zhu, Dongru Zhou","doi":"10.1109/ICCIMA.2003.1238130","DOIUrl":null,"url":null,"abstract":"Scene change detection is an essential step to automatic and content-based video indexing, retrieval, and browsing. In this paper, a robust scene change detection method is presented, which analyzes both audio and visual information sources and accounts for their inter-relations and coincidence to semantically identify video scenes. Audio analysis focuses on the segmentation of audio source into four types of semantic data such as silence, speech, music, and environmental sound. Speech data are further decomposed into different elements according to different speakers. Meanwhile, visual analysis partitions video source into shots. Results from single source segmentation are in some cases suboptimal. By combining visual and audio features, the scene extraction accuracy is enhanced, and more semantic segmentations are developed. Experimental results are proven to be appropriate for content-based video indexing and retrieval.","PeriodicalId":385362,"journal":{"name":"Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003","volume":"113 3-4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Scene change detection based on audio and video content analysis\",\"authors\":\"Yingying Zhu, Dongru Zhou\",\"doi\":\"10.1109/ICCIMA.2003.1238130\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Scene change detection is an essential step to automatic and content-based video indexing, retrieval, and browsing. In this paper, a robust scene change detection method is presented, which analyzes both audio and visual information sources and accounts for their inter-relations and coincidence to semantically identify video scenes. Audio analysis focuses on the segmentation of audio source into four types of semantic data such as silence, speech, music, and environmental sound. Speech data are further decomposed into different elements according to different speakers. Meanwhile, visual analysis partitions video source into shots. Results from single source segmentation are in some cases suboptimal. By combining visual and audio features, the scene extraction accuracy is enhanced, and more semantic segmentations are developed. Experimental results are proven to be appropriate for content-based video indexing and retrieval.\",\"PeriodicalId\":385362,\"journal\":{\"name\":\"Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003\",\"volume\":\"113 3-4\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCIMA.2003.1238130\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIMA.2003.1238130","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Scene change detection based on audio and video content analysis
Scene change detection is an essential step to automatic and content-based video indexing, retrieval, and browsing. In this paper, a robust scene change detection method is presented, which analyzes both audio and visual information sources and accounts for their inter-relations and coincidence to semantically identify video scenes. Audio analysis focuses on the segmentation of audio source into four types of semantic data such as silence, speech, music, and environmental sound. Speech data are further decomposed into different elements according to different speakers. Meanwhile, visual analysis partitions video source into shots. Results from single source segmentation are in some cases suboptimal. By combining visual and audio features, the scene extraction accuracy is enhanced, and more semantic segmentations are developed. Experimental results are proven to be appropriate for content-based video indexing and retrieval.