基于多流深度网络的CCTV视频抢劫事件分类与时间定位

2019 IEEE 16th International Conference on Smart Cities: Improving Quality of Life Using ICT & IoT and AI (HONET-ICT) Pub Date : 2019-10-01 DOI:10.1109/HONET.2019.8908040

Zakia Yahya, M. M. Ullah

{"title":"基于多流深度网络的CCTV视频抢劫事件分类与时间定位","authors":"Zakia Yahya, M. M. Ullah","doi":"10.1109/HONET.2019.8908040","DOIUrl":null,"url":null,"abstract":"Robbery is an open social problem. Towards tackling this problem, we in this paper propose multi-stream deep networks for the classification as well as temporal localization of robbery events in CCTV videos. In our multi-stream architecture, each stream is comprised of a pre-trained 3D ConvNet in combination with LSTM which is followed by softmax. In particular, we investigate three streams based on three different types of input: (a) RGB data, (b) optical flows, and (c) foreground masks. Each stream is trained independently, and the final scores are averaged for predictions.To test the approach, we compile a robbery dataset from YouTube, which contains 124 untrimmed CCTV videos. Empirical comparison with several state-of-the-art methods demonstrate the promise of our multi-stream model in both the classification as well as temporal localization tasks.","PeriodicalId":291738,"journal":{"name":"2019 IEEE 16th International Conference on Smart Cities: Improving Quality of Life Using ICT & IoT and AI (HONET-ICT)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Classification and Temporal Localization of Robbery Events in CCTV Videos through Multi-Stream Deep Networks\",\"authors\":\"Zakia Yahya, M. M. Ullah\",\"doi\":\"10.1109/HONET.2019.8908040\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Robbery is an open social problem. Towards tackling this problem, we in this paper propose multi-stream deep networks for the classification as well as temporal localization of robbery events in CCTV videos. In our multi-stream architecture, each stream is comprised of a pre-trained 3D ConvNet in combination with LSTM which is followed by softmax. In particular, we investigate three streams based on three different types of input: (a) RGB data, (b) optical flows, and (c) foreground masks. Each stream is trained independently, and the final scores are averaged for predictions.To test the approach, we compile a robbery dataset from YouTube, which contains 124 untrimmed CCTV videos. Empirical comparison with several state-of-the-art methods demonstrate the promise of our multi-stream model in both the classification as well as temporal localization tasks.\",\"PeriodicalId\":291738,\"journal\":{\"name\":\"2019 IEEE 16th International Conference on Smart Cities: Improving Quality of Life Using ICT & IoT and AI (HONET-ICT)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 16th International Conference on Smart Cities: Improving Quality of Life Using ICT & IoT and AI (HONET-ICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HONET.2019.8908040\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 16th International Conference on Smart Cities: Improving Quality of Life Using ICT & IoT and AI (HONET-ICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HONET.2019.8908040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

抢劫是一个公开的社会问题。为了解决这一问题，本文提出了多流深度网络对CCTV视频中的抢劫事件进行分类和时间定位。在我们的多流架构中，每个流由预训练的3D ConvNet与LSTM结合组成，然后是softmax。特别是，我们研究了基于三种不同类型输入的三种流:(a) RGB数据，(b)光流和(c)前景掩模。每个流都是独立训练的，并对最终分数进行平均预测。为了测试这种方法，我们编译了一个来自YouTube的抢劫数据集，其中包含124个未修剪的CCTV视频。与几种最先进的方法进行实证比较，证明了我们的多流模型在分类和时间定位任务中的前景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Classification and Temporal Localization of Robbery Events in CCTV Videos through Multi-Stream Deep Networks

Robbery is an open social problem. Towards tackling this problem, we in this paper propose multi-stream deep networks for the classification as well as temporal localization of robbery events in CCTV videos. In our multi-stream architecture, each stream is comprised of a pre-trained 3D ConvNet in combination with LSTM which is followed by softmax. In particular, we investigate three streams based on three different types of input: (a) RGB data, (b) optical flows, and (c) foreground masks. Each stream is trained independently, and the final scores are averaged for predictions.To test the approach, we compile a robbery dataset from YouTube, which contains 124 untrimmed CCTV videos. Empirical comparison with several state-of-the-art methods demonstrate the promise of our multi-stream model in both the classification as well as temporal localization tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE 16th International Conference on Smart Cities: Improving Quality of Life Using ICT & IoT and AI (HONET-ICT)

自引率

0.00%

发文量