Segmental multi-way local pooling for video recognition

Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI:10.1145/2502081.2502167

Ilseo Kim, Sangmin Oh, Arash Vahdat, Kevin J. Cannons, A. Perera, Greg Mori

引用次数: 7

Abstract

In this work, we address the problem of complex event detection on unconstrained videos. We introduce a novel multi-way feature pooling approach which leverages segment-level information. The approach is simple and widely applicable to diverse audio-visual features. Our approach uses a set of clusters discovered via unsupervised clustering of segment-level features. Depending on feature characteristics, not only scene-based clusters but also motion/audio-based clusters can be incorporated. Then, every video is represented with multiple descriptors, where each descriptor is designed to relate to one of the pre-built clusters. For classification, intersection kernel SVMs are used where the kernel is obtained by combining multiple kernels computed from corresponding per-cluster descriptor pairs. Evaluation on TRECVID'11 MED dataset shows a significant improvement by the proposed approach beyond the state-of-the-art.

查看原文本刊更多论文

视频识别的分段多路局部池化

在这项工作中，我们解决了在无约束视频上的复杂事件检测问题。我们引入了一种新的多路特征池化方法，该方法利用了段级信息。该方法简单，可广泛应用于各种视听特征。我们的方法使用一组通过对段级特征的无监督聚类发现的聚类。根据特征特征，不仅可以合并基于场景的集群，还可以合并基于动作/音频的集群。然后，每个视频用多个描述符表示，其中每个描述符被设计为与预构建的集群之一相关。对于分类，使用交集核支持向量机，其中核是通过组合从相应的每簇描述符对计算的多个核来获得的。对TRECVID'11 MED数据集的评估表明，该方法的显著改进超出了最先进的水平。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 21st ACM international conference on Multimedia

自引率

0.00%

发文量