一种高效的大规模视频类型分类框架

2010 IEEE International Workshop on Multimedia Signal Processing Pub Date : 2010-10-01 DOI:10.1109/MMSP.2010.5662069

Ning Zhang, L. Guan

{"title":"一种高效的大规模视频类型分类框架","authors":"Ning Zhang, L. Guan","doi":"10.1109/MMSP.2010.5662069","DOIUrl":null,"url":null,"abstract":"Efficient data mining and indexing is important for multimedia analysis and retrieval. In the field of large-scale video analysis, effective genre categorization plays an important role and serves one of the fundamental steps for data mining. Existing works utilize domain-knowledge dependent feature extraction, which is limited from genre diversification as well as data volume scalability. In this paper, we propose a systematic framework for automatically classifying video genres using domain-knowledge independent descriptors in feature extraction, and a bag-of-visualwords (BoW) based model in compact video representation. Scale invariant feature transform (SIFT) local descriptor accelerated by GPU hardware is adopted for feature extraction. BoW model with an innovative codebook generation using bottom-up two-layer K-means clustering is proposed to abstract the video characteristics. Besides the histogram-based distribution in summarizing video data, a modified latent Dirichlet allocation (mLDA) based distribution is also introduced. At the classification stage, a k-nearest neighbor (k-NN) classifier is employed. Compared with state of art large-scale genre categorization in [1], the experimental results on a 23-sports dataset demonstrate that our proposed framework achieves a comparable classification accuracy with 27% and 64% expansion in data volume and diversity, respectively.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"An efficient framework on large-scale video genre classification\",\"authors\":\"Ning Zhang, L. Guan\",\"doi\":\"10.1109/MMSP.2010.5662069\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Efficient data mining and indexing is important for multimedia analysis and retrieval. In the field of large-scale video analysis, effective genre categorization plays an important role and serves one of the fundamental steps for data mining. Existing works utilize domain-knowledge dependent feature extraction, which is limited from genre diversification as well as data volume scalability. In this paper, we propose a systematic framework for automatically classifying video genres using domain-knowledge independent descriptors in feature extraction, and a bag-of-visualwords (BoW) based model in compact video representation. Scale invariant feature transform (SIFT) local descriptor accelerated by GPU hardware is adopted for feature extraction. BoW model with an innovative codebook generation using bottom-up two-layer K-means clustering is proposed to abstract the video characteristics. Besides the histogram-based distribution in summarizing video data, a modified latent Dirichlet allocation (mLDA) based distribution is also introduced. At the classification stage, a k-nearest neighbor (k-NN) classifier is employed. Compared with state of art large-scale genre categorization in [1], the experimental results on a 23-sports dataset demonstrate that our proposed framework achieves a comparable classification accuracy with 27% and 64% expansion in data volume and diversity, respectively.\",\"PeriodicalId\":105774,\"journal\":{\"name\":\"2010 IEEE International Workshop on Multimedia Signal Processing\",\"volume\":\"56 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Workshop on Multimedia Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MMSP.2010.5662069\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Workshop on Multimedia Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MMSP.2010.5662069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

高效的数据挖掘和索引对多媒体分析和检索具有重要意义。在大规模视频分析领域，有效的类型分类扮演着重要的角色，是数据挖掘的基础步骤之一。现有的工作采用依赖领域知识的特征提取，受类型多样化和数据量可扩展性的限制。在本文中，我们提出了一个系统的框架，在特征提取中使用领域知识无关的描述符自动分类视频类型，在紧凑视频表示中使用基于视觉词袋(BoW)的模型。采用GPU硬件加速的尺度不变特征变换(SIFT)局部描述子进行特征提取。提出了一种基于自底向上两层k均值聚类的编码本生成BoW模型来抽象视频特征。除了基于直方图的视频数据汇总分布外，还引入了一种改进的基于潜在狄利克雷分布(mLDA)的视频数据汇总分布。在分类阶段，使用k近邻(k-NN)分类器。与[1]中最先进的大规模类型分类相比，在23个运动数据集上的实验结果表明，我们提出的框架在数据量和多样性上分别扩展了27%和64%，达到了相当的分类精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An efficient framework on large-scale video genre classification

Efficient data mining and indexing is important for multimedia analysis and retrieval. In the field of large-scale video analysis, effective genre categorization plays an important role and serves one of the fundamental steps for data mining. Existing works utilize domain-knowledge dependent feature extraction, which is limited from genre diversification as well as data volume scalability. In this paper, we propose a systematic framework for automatically classifying video genres using domain-knowledge independent descriptors in feature extraction, and a bag-of-visualwords (BoW) based model in compact video representation. Scale invariant feature transform (SIFT) local descriptor accelerated by GPU hardware is adopted for feature extraction. BoW model with an innovative codebook generation using bottom-up two-layer K-means clustering is proposed to abstract the video characteristics. Besides the histogram-based distribution in summarizing video data, a modified latent Dirichlet allocation (mLDA) based distribution is also introduced. At the classification stage, a k-nearest neighbor (k-NN) classifier is employed. Compared with state of art large-scale genre categorization in [1], the experimental results on a 23-sports dataset demonstrate that our proposed framework achieves a comparable classification accuracy with 27% and 64% expansion in data volume and diversity, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 IEEE International Workshop on Multimedia Signal Processing

自引率

0.00%

发文量