Classification and Temporal Localization of Robbery Events in CCTV Videos through Multi-Stream Deep Networks

2019 IEEE 16th International Conference on Smart Cities: Improving Quality of Life Using ICT & IoT and AI (HONET-ICT) Pub Date : 2019-10-01 DOI:10.1109/HONET.2019.8908040

Zakia Yahya, M. M. Ullah

引用次数: 3

Abstract

Robbery is an open social problem. Towards tackling this problem, we in this paper propose multi-stream deep networks for the classification as well as temporal localization of robbery events in CCTV videos. In our multi-stream architecture, each stream is comprised of a pre-trained 3D ConvNet in combination with LSTM which is followed by softmax. In particular, we investigate three streams based on three different types of input: (a) RGB data, (b) optical flows, and (c) foreground masks. Each stream is trained independently, and the final scores are averaged for predictions.To test the approach, we compile a robbery dataset from YouTube, which contains 124 untrimmed CCTV videos. Empirical comparison with several state-of-the-art methods demonstrate the promise of our multi-stream model in both the classification as well as temporal localization tasks.

查看原文本刊更多论文

基于多流深度网络的CCTV视频抢劫事件分类与时间定位

抢劫是一个公开的社会问题。为了解决这一问题，本文提出了多流深度网络对CCTV视频中的抢劫事件进行分类和时间定位。在我们的多流架构中，每个流由预训练的3D ConvNet与LSTM结合组成，然后是softmax。特别是，我们研究了基于三种不同类型输入的三种流:(a) RGB数据，(b)光流和(c)前景掩模。每个流都是独立训练的，并对最终分数进行平均预测。为了测试这种方法，我们编译了一个来自YouTube的抢劫数据集，其中包含124个未修剪的CCTV视频。与几种最先进的方法进行实证比较，证明了我们的多流模型在分类和时间定位任务中的前景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE 16th International Conference on Smart Cities: Improving Quality of Life Using ICT & IoT and AI (HONET-ICT)

自引率

0.00%

发文量