基于音视频内容分析的场景变化检测

Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003 Pub Date : 2003-09-27 DOI:10.1109/ICCIMA.2003.1238130

Yingying Zhu, Dongru Zhou

{"title":"基于音视频内容分析的场景变化检测","authors":"Yingying Zhu, Dongru Zhou","doi":"10.1109/ICCIMA.2003.1238130","DOIUrl":null,"url":null,"abstract":"Scene change detection is an essential step to automatic and content-based video indexing, retrieval, and browsing. In this paper, a robust scene change detection method is presented, which analyzes both audio and visual information sources and accounts for their inter-relations and coincidence to semantically identify video scenes. Audio analysis focuses on the segmentation of audio source into four types of semantic data such as silence, speech, music, and environmental sound. Speech data are further decomposed into different elements according to different speakers. Meanwhile, visual analysis partitions video source into shots. Results from single source segmentation are in some cases suboptimal. By combining visual and audio features, the scene extraction accuracy is enhanced, and more semantic segmentations are developed. Experimental results are proven to be appropriate for content-based video indexing and retrieval.","PeriodicalId":385362,"journal":{"name":"Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003","volume":"113 3-4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Scene change detection based on audio and video content analysis\",\"authors\":\"Yingying Zhu, Dongru Zhou\",\"doi\":\"10.1109/ICCIMA.2003.1238130\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Scene change detection is an essential step to automatic and content-based video indexing, retrieval, and browsing. In this paper, a robust scene change detection method is presented, which analyzes both audio and visual information sources and accounts for their inter-relations and coincidence to semantically identify video scenes. Audio analysis focuses on the segmentation of audio source into four types of semantic data such as silence, speech, music, and environmental sound. Speech data are further decomposed into different elements according to different speakers. Meanwhile, visual analysis partitions video source into shots. Results from single source segmentation are in some cases suboptimal. By combining visual and audio features, the scene extraction accuracy is enhanced, and more semantic segmentations are developed. Experimental results are proven to be appropriate for content-based video indexing and retrieval.\",\"PeriodicalId\":385362,\"journal\":{\"name\":\"Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003\",\"volume\":\"113 3-4\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCIMA.2003.1238130\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIMA.2003.1238130","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

场景变化检测是自动和基于内容的视频索引、检索和浏览的重要步骤。本文提出了一种鲁棒的场景变化检测方法，该方法同时分析音频和视觉信息源，并考虑它们之间的相互关系和巧合，从而对视频场景进行语义识别。音频分析侧重于将音频源分割为四种类型的语义数据，如沉默、语音、音乐和环境声音。根据不同的说话人，将语音数据进一步分解为不同的元素。同时，视觉分析将视频源划分为多个镜头。在某些情况下，单一源分割的结果不是最优的。通过结合视觉和音频特征，提高了场景提取的准确性，并开发了更多的语义分割。实验结果表明，该方法适用于基于内容的视频索引和检索。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Scene change detection based on audio and video content analysis

Scene change detection is an essential step to automatic and content-based video indexing, retrieval, and browsing. In this paper, a robust scene change detection method is presented, which analyzes both audio and visual information sources and accounts for their inter-relations and coincidence to semantically identify video scenes. Audio analysis focuses on the segmentation of audio source into four types of semantic data such as silence, speech, music, and environmental sound. Speech data are further decomposed into different elements according to different speakers. Meanwhile, visual analysis partitions video source into shots. Results from single source segmentation are in some cases suboptimal. By combining visual and audio features, the scene extraction accuracy is enhanced, and more semantic segmentations are developed. Experimental results are proven to be appropriate for content-based video indexing and retrieval.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003

自引率

0.00%

发文量