vsum:用于视频摘要的虚拟监控数据集

Proceedings of the 5th International Conference on Control and Computer Vision Pub Date : 2022-08-19 DOI:10.1145/3561613.3561631

Yanfei Zhang, Yulai Xie, Yang Zhang, Yiruo Dai, Fang Ren

{"title":"vsum:用于视频摘要的虚拟监控数据集","authors":"Yanfei Zhang, Yulai Xie, Yang Zhang, Yiruo Dai, Fang Ren","doi":"10.1145/3561613.3561631","DOIUrl":null,"url":null,"abstract":"Video summary can greatly reduce the size of video while retaining most of the content, which is a very promising video analysis technology, especially for surveillance. However, few datasets can be used for video summaries because of the privacy and verbosity of surveillance videos. In order to improve the performance of video summary technology in the surveillance video area, we introduce VSSum, a Virtual Surveillance video dataset, which currently contains 1,000 simulated videos in virtual scenarios with a length of 5 minutes. Each video contains one predefined abnormal action to be summarized and multiple normal actions. Moreover, randomness is introduced from various aspects, such as character models, camera angles, action times, and positions. The dataset has the characteristics of controllability, diversity, and large-scale. Considering the characteristics of surveillance video, we propose a baseline model for surveillance video summary called VSSumNet. It uses the padding and threshold method instead of the center-cropping and proportion method separately, and the 1D convolution is used to enhance the continuity of output. Experimental results show that the baseline outperforms previous methods.","PeriodicalId":348024,"journal":{"name":"Proceedings of the 5th International Conference on Control and Computer Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VSSum: A Virtual Surveillance Dataset for Video Summary\",\"authors\":\"Yanfei Zhang, Yulai Xie, Yang Zhang, Yiruo Dai, Fang Ren\",\"doi\":\"10.1145/3561613.3561631\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Video summary can greatly reduce the size of video while retaining most of the content, which is a very promising video analysis technology, especially for surveillance. However, few datasets can be used for video summaries because of the privacy and verbosity of surveillance videos. In order to improve the performance of video summary technology in the surveillance video area, we introduce VSSum, a Virtual Surveillance video dataset, which currently contains 1,000 simulated videos in virtual scenarios with a length of 5 minutes. Each video contains one predefined abnormal action to be summarized and multiple normal actions. Moreover, randomness is introduced from various aspects, such as character models, camera angles, action times, and positions. The dataset has the characteristics of controllability, diversity, and large-scale. Considering the characteristics of surveillance video, we propose a baseline model for surveillance video summary called VSSumNet. It uses the padding and threshold method instead of the center-cropping and proportion method separately, and the 1D convolution is used to enhance the continuity of output. Experimental results show that the baseline outperforms previous methods.\",\"PeriodicalId\":348024,\"journal\":{\"name\":\"Proceedings of the 5th International Conference on Control and Computer Vision\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 5th International Conference on Control and Computer Vision\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3561613.3561631\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th International Conference on Control and Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3561613.3561631","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

视频摘要可以在保留大部分视频内容的同时大大减小视频的大小，是一种非常有前途的视频分析技术，尤其适用于监控。然而，由于监控视频的隐私性和冗长性，很少有数据集可以用于视频摘要。为了提高视频摘要技术在监控视频领域的性能，我们引入了虚拟监控视频数据集VSSum，该数据集目前包含1000个虚拟场景的模拟视频，长度为5分钟。每个视频包含一个要总结的预定义异常动作和多个正常动作。此外，随机性是从角色模型、镜头角度、动作时间和位置等各个方面引入的。该数据集具有可控性、多样性和大规模等特点。考虑到监控视频的特点，提出了一种监控视频摘要的基线模型VSSumNet。采用填充法和阈值法代替中心裁剪法和比例法，采用一维卷积增强输出的连续性。实验结果表明，该方法优于以往的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

VSSum: A Virtual Surveillance Dataset for Video Summary

Video summary can greatly reduce the size of video while retaining most of the content, which is a very promising video analysis technology, especially for surveillance. However, few datasets can be used for video summaries because of the privacy and verbosity of surveillance videos. In order to improve the performance of video summary technology in the surveillance video area, we introduce VSSum, a Virtual Surveillance video dataset, which currently contains 1,000 simulated videos in virtual scenarios with a length of 5 minutes. Each video contains one predefined abnormal action to be summarized and multiple normal actions. Moreover, randomness is introduced from various aspects, such as character models, camera angles, action times, and positions. The dataset has the characteristics of controllability, diversity, and large-scale. Considering the characteristics of surveillance video, we propose a baseline model for surveillance video summary called VSSumNet. It uses the padding and threshold method instead of the center-cropping and proportion method separately, and the 1D convolution is used to enhance the continuity of output. Experimental results show that the baseline outperforms previous methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 5th International Conference on Control and Computer Vision

自引率

0.00%

发文量