视频识别的分段多路局部池化

Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI:10.1145/2502081.2502167

Ilseo Kim, Sangmin Oh, Arash Vahdat, Kevin J. Cannons, A. Perera, Greg Mori

{"title":"视频识别的分段多路局部池化","authors":"Ilseo Kim, Sangmin Oh, Arash Vahdat, Kevin J. Cannons, A. Perera, Greg Mori","doi":"10.1145/2502081.2502167","DOIUrl":null,"url":null,"abstract":"In this work, we address the problem of complex event detection on unconstrained videos. We introduce a novel multi-way feature pooling approach which leverages segment-level information. The approach is simple and widely applicable to diverse audio-visual features. Our approach uses a set of clusters discovered via unsupervised clustering of segment-level features. Depending on feature characteristics, not only scene-based clusters but also motion/audio-based clusters can be incorporated. Then, every video is represented with multiple descriptors, where each descriptor is designed to relate to one of the pre-built clusters. For classification, intersection kernel SVMs are used where the kernel is obtained by combining multiple kernels computed from corresponding per-cluster descriptor pairs. Evaluation on TRECVID'11 MED dataset shows a significant improvement by the proposed approach beyond the state-of-the-art.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Segmental multi-way local pooling for video recognition\",\"authors\":\"Ilseo Kim, Sangmin Oh, Arash Vahdat, Kevin J. Cannons, A. Perera, Greg Mori\",\"doi\":\"10.1145/2502081.2502167\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work, we address the problem of complex event detection on unconstrained videos. We introduce a novel multi-way feature pooling approach which leverages segment-level information. The approach is simple and widely applicable to diverse audio-visual features. Our approach uses a set of clusters discovered via unsupervised clustering of segment-level features. Depending on feature characteristics, not only scene-based clusters but also motion/audio-based clusters can be incorporated. Then, every video is represented with multiple descriptors, where each descriptor is designed to relate to one of the pre-built clusters. For classification, intersection kernel SVMs are used where the kernel is obtained by combining multiple kernels computed from corresponding per-cluster descriptor pairs. Evaluation on TRECVID'11 MED dataset shows a significant improvement by the proposed approach beyond the state-of-the-art.\",\"PeriodicalId\":20448,\"journal\":{\"name\":\"Proceedings of the 21st ACM international conference on Multimedia\",\"volume\":\"20 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 21st ACM international conference on Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2502081.2502167\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st ACM international conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2502081.2502167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

在这项工作中，我们解决了在无约束视频上的复杂事件检测问题。我们引入了一种新的多路特征池化方法，该方法利用了段级信息。该方法简单，可广泛应用于各种视听特征。我们的方法使用一组通过对段级特征的无监督聚类发现的聚类。根据特征特征，不仅可以合并基于场景的集群，还可以合并基于动作/音频的集群。然后，每个视频用多个描述符表示，其中每个描述符被设计为与预构建的集群之一相关。对于分类，使用交集核支持向量机，其中核是通过组合从相应的每簇描述符对计算的多个核来获得的。对TRECVID'11 MED数据集的评估表明，该方法的显著改进超出了最先进的水平。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Segmental multi-way local pooling for video recognition

In this work, we address the problem of complex event detection on unconstrained videos. We introduce a novel multi-way feature pooling approach which leverages segment-level information. The approach is simple and widely applicable to diverse audio-visual features. Our approach uses a set of clusters discovered via unsupervised clustering of segment-level features. Depending on feature characteristics, not only scene-based clusters but also motion/audio-based clusters can be incorporated. Then, every video is represented with multiple descriptors, where each descriptor is designed to relate to one of the pre-built clusters. For classification, intersection kernel SVMs are used where the kernel is obtained by combining multiple kernels computed from corresponding per-cluster descriptor pairs. Evaluation on TRECVID'11 MED dataset shows a significant improvement by the proposed approach beyond the state-of-the-art.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 21st ACM international conference on Multimedia

自引率

0.00%

发文量