{"title":"一种高效的大规模视频类型分类框架","authors":"Ning Zhang, L. Guan","doi":"10.1109/MMSP.2010.5662069","DOIUrl":null,"url":null,"abstract":"Efficient data mining and indexing is important for multimedia analysis and retrieval. In the field of large-scale video analysis, effective genre categorization plays an important role and serves one of the fundamental steps for data mining. Existing works utilize domain-knowledge dependent feature extraction, which is limited from genre diversification as well as data volume scalability. In this paper, we propose a systematic framework for automatically classifying video genres using domain-knowledge independent descriptors in feature extraction, and a bag-of-visualwords (BoW) based model in compact video representation. Scale invariant feature transform (SIFT) local descriptor accelerated by GPU hardware is adopted for feature extraction. BoW model with an innovative codebook generation using bottom-up two-layer K-means clustering is proposed to abstract the video characteristics. Besides the histogram-based distribution in summarizing video data, a modified latent Dirichlet allocation (mLDA) based distribution is also introduced. At the classification stage, a k-nearest neighbor (k-NN) classifier is employed. Compared with state of art large-scale genre categorization in [1], the experimental results on a 23-sports dataset demonstrate that our proposed framework achieves a comparable classification accuracy with 27% and 64% expansion in data volume and diversity, respectively.","PeriodicalId":105774,"journal":{"name":"2010 IEEE International Workshop on Multimedia Signal Processing","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"An efficient framework on large-scale video genre classification\",\"authors\":\"Ning Zhang, L. Guan\",\"doi\":\"10.1109/MMSP.2010.5662069\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Efficient data mining and indexing is important for multimedia analysis and retrieval. In the field of large-scale video analysis, effective genre categorization plays an important role and serves one of the fundamental steps for data mining. Existing works utilize domain-knowledge dependent feature extraction, which is limited from genre diversification as well as data volume scalability. In this paper, we propose a systematic framework for automatically classifying video genres using domain-knowledge independent descriptors in feature extraction, and a bag-of-visualwords (BoW) based model in compact video representation. Scale invariant feature transform (SIFT) local descriptor accelerated by GPU hardware is adopted for feature extraction. BoW model with an innovative codebook generation using bottom-up two-layer K-means clustering is proposed to abstract the video characteristics. Besides the histogram-based distribution in summarizing video data, a modified latent Dirichlet allocation (mLDA) based distribution is also introduced. At the classification stage, a k-nearest neighbor (k-NN) classifier is employed. Compared with state of art large-scale genre categorization in [1], the experimental results on a 23-sports dataset demonstrate that our proposed framework achieves a comparable classification accuracy with 27% and 64% expansion in data volume and diversity, respectively.\",\"PeriodicalId\":105774,\"journal\":{\"name\":\"2010 IEEE International Workshop on Multimedia Signal Processing\",\"volume\":\"56 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Workshop on Multimedia Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MMSP.2010.5662069\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Workshop on Multimedia Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MMSP.2010.5662069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An efficient framework on large-scale video genre classification
Efficient data mining and indexing is important for multimedia analysis and retrieval. In the field of large-scale video analysis, effective genre categorization plays an important role and serves one of the fundamental steps for data mining. Existing works utilize domain-knowledge dependent feature extraction, which is limited from genre diversification as well as data volume scalability. In this paper, we propose a systematic framework for automatically classifying video genres using domain-knowledge independent descriptors in feature extraction, and a bag-of-visualwords (BoW) based model in compact video representation. Scale invariant feature transform (SIFT) local descriptor accelerated by GPU hardware is adopted for feature extraction. BoW model with an innovative codebook generation using bottom-up two-layer K-means clustering is proposed to abstract the video characteristics. Besides the histogram-based distribution in summarizing video data, a modified latent Dirichlet allocation (mLDA) based distribution is also introduced. At the classification stage, a k-nearest neighbor (k-NN) classifier is employed. Compared with state of art large-scale genre categorization in [1], the experimental results on a 23-sports dataset demonstrate that our proposed framework achieves a comparable classification accuracy with 27% and 64% expansion in data volume and diversity, respectively.