Video Summarization: How to Use Deep-Learned Features Without a Large-Scale Dataset

2018 9th International Conference on Awareness Science and Technology (iCAST) Pub Date : 2018-08-24 DOI:10.29007/21Q3

Didik Purwanto, Yie-Tarng Chen, Wen-Hsien Fang, Wen-Chi Wu

引用次数: 3

Abstract

This paper proposes a framework incorporating deep-learned features with the conventional machine learning models within which the objective function is optimized by using quadratic programming or quasi-Newton methods instead of an end-to-end deep learning approach which uses variants of stochastic gradient descent algorithms. A temporal segmentation algorithm is first scrutinized by using a learning to rank scheme to detect the abrupt changes of frame appearances in a video sequence. Afterward, a peak-searching algorithm, statisticssensitive non-linear iterative peak-clipping (SNIP), is employed to acquire the local maxima of the filtered video sequence after rank pooling, where each of the local maxima corresponds to a key frame in the video. Simulations show that the new approach outperforms the main state-of-the-art works on four public video datasets.

查看原文本刊更多论文

视频摘要:如何在没有大规模数据集的情况下使用深度学习的特征

本文提出了一个将深度学习特征与传统机器学习模型相结合的框架，其中目标函数通过使用二次规划或准牛顿方法进行优化，而不是使用随机梯度下降算法变体的端到端深度学习方法。首先研究了一种时间分割算法，采用学习排序方法检测视频序列中帧外观的突变。然后，采用峰值搜索算法统计敏感非线性迭代峰值裁剪(SNIP)，在秩池化后获取滤波后的视频序列的局部最大值，其中每个局部最大值对应视频中的一个关键帧。仿真结果表明，该方法在四个公共视频数据集上的性能优于目前最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 9th International Conference on Awareness Science and Technology (iCAST)

自引率

0.00%

发文量