Yanfei Zhang, Yulai Xie, Yang Zhang, Yiruo Dai, Fang Ren
{"title":"vsum:用于视频摘要的虚拟监控数据集","authors":"Yanfei Zhang, Yulai Xie, Yang Zhang, Yiruo Dai, Fang Ren","doi":"10.1145/3561613.3561631","DOIUrl":null,"url":null,"abstract":"Video summary can greatly reduce the size of video while retaining most of the content, which is a very promising video analysis technology, especially for surveillance. However, few datasets can be used for video summaries because of the privacy and verbosity of surveillance videos. In order to improve the performance of video summary technology in the surveillance video area, we introduce VSSum, a Virtual Surveillance video dataset, which currently contains 1,000 simulated videos in virtual scenarios with a length of 5 minutes. Each video contains one predefined abnormal action to be summarized and multiple normal actions. Moreover, randomness is introduced from various aspects, such as character models, camera angles, action times, and positions. The dataset has the characteristics of controllability, diversity, and large-scale. Considering the characteristics of surveillance video, we propose a baseline model for surveillance video summary called VSSumNet. It uses the padding and threshold method instead of the center-cropping and proportion method separately, and the 1D convolution is used to enhance the continuity of output. Experimental results show that the baseline outperforms previous methods.","PeriodicalId":348024,"journal":{"name":"Proceedings of the 5th International Conference on Control and Computer Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VSSum: A Virtual Surveillance Dataset for Video Summary\",\"authors\":\"Yanfei Zhang, Yulai Xie, Yang Zhang, Yiruo Dai, Fang Ren\",\"doi\":\"10.1145/3561613.3561631\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Video summary can greatly reduce the size of video while retaining most of the content, which is a very promising video analysis technology, especially for surveillance. However, few datasets can be used for video summaries because of the privacy and verbosity of surveillance videos. In order to improve the performance of video summary technology in the surveillance video area, we introduce VSSum, a Virtual Surveillance video dataset, which currently contains 1,000 simulated videos in virtual scenarios with a length of 5 minutes. Each video contains one predefined abnormal action to be summarized and multiple normal actions. Moreover, randomness is introduced from various aspects, such as character models, camera angles, action times, and positions. The dataset has the characteristics of controllability, diversity, and large-scale. Considering the characteristics of surveillance video, we propose a baseline model for surveillance video summary called VSSumNet. It uses the padding and threshold method instead of the center-cropping and proportion method separately, and the 1D convolution is used to enhance the continuity of output. Experimental results show that the baseline outperforms previous methods.\",\"PeriodicalId\":348024,\"journal\":{\"name\":\"Proceedings of the 5th International Conference on Control and Computer Vision\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 5th International Conference on Control and Computer Vision\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3561613.3561631\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th International Conference on Control and Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3561613.3561631","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
VSSum: A Virtual Surveillance Dataset for Video Summary
Video summary can greatly reduce the size of video while retaining most of the content, which is a very promising video analysis technology, especially for surveillance. However, few datasets can be used for video summaries because of the privacy and verbosity of surveillance videos. In order to improve the performance of video summary technology in the surveillance video area, we introduce VSSum, a Virtual Surveillance video dataset, which currently contains 1,000 simulated videos in virtual scenarios with a length of 5 minutes. Each video contains one predefined abnormal action to be summarized and multiple normal actions. Moreover, randomness is introduced from various aspects, such as character models, camera angles, action times, and positions. The dataset has the characteristics of controllability, diversity, and large-scale. Considering the characteristics of surveillance video, we propose a baseline model for surveillance video summary called VSSumNet. It uses the padding and threshold method instead of the center-cropping and proportion method separately, and the 1D convolution is used to enhance the continuity of output. Experimental results show that the baseline outperforms previous methods.