Video-zilla:用于大规模视频分析的索引层

Proceedings of the 2022 International Conference on Management of Data Pub Date : 2022-06-10 DOI:10.1145/3514221.3517840

Bo Hu, Peizhen Guo, Wenjun Hu

{"title":"Video-zilla:用于大规模视频分析的索引层","authors":"Bo Hu, Peizhen Guo, Wenjun Hu","doi":"10.1145/3514221.3517840","DOIUrl":null,"url":null,"abstract":"Pervasive deployment of surveillance cameras today poses enormous scalability challenges to video analytics systems operating over many camera feeds. Currently, there are few indexing tools to organize video feeds beyond what is provided by a standard file system. Recent video analytic systems implement application-specific frame profiling and sampling techniques to reduce the number of raw videos processed, leveraging frame-level redundancy or manually labeled spatial-temporal correlation between cameras. This paper presents Video-zilla, a standalone indexing layer between video query systems and a video store to organize video data. We propose a video data unit abstraction, semantic video stream (SVS), based on a notion of distance between objects in the video. SVS implicitly captures scenes, which is missing from current video content characterization and a middle ground between individual frames and an entire camera feed. We then build a hierarchical index that exposes the semantic similarity both within and across camera feeds, such that Video-zilla can quickly cluster video feeds based on their content semantics without manual labeling. We implement and evaluate Video-zilla in three use cases: object identification queries, clustering for training specialized DNNs, and archival services. In all three cases, Video-zilla reduces the time complexity of inter-camera video analytics from linear with the number of cameras to sublinear, and reduces query resource usage by up to 14× compared to using frame-level or spatial-temporal similarity built into existing query systems.","PeriodicalId":410404,"journal":{"name":"Proceedings of the 2022 International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Video-zilla: An Indexing Layer for Large-Scale Video Analytics\",\"authors\":\"Bo Hu, Peizhen Guo, Wenjun Hu\",\"doi\":\"10.1145/3514221.3517840\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pervasive deployment of surveillance cameras today poses enormous scalability challenges to video analytics systems operating over many camera feeds. Currently, there are few indexing tools to organize video feeds beyond what is provided by a standard file system. Recent video analytic systems implement application-specific frame profiling and sampling techniques to reduce the number of raw videos processed, leveraging frame-level redundancy or manually labeled spatial-temporal correlation between cameras. This paper presents Video-zilla, a standalone indexing layer between video query systems and a video store to organize video data. We propose a video data unit abstraction, semantic video stream (SVS), based on a notion of distance between objects in the video. SVS implicitly captures scenes, which is missing from current video content characterization and a middle ground between individual frames and an entire camera feed. We then build a hierarchical index that exposes the semantic similarity both within and across camera feeds, such that Video-zilla can quickly cluster video feeds based on their content semantics without manual labeling. We implement and evaluate Video-zilla in three use cases: object identification queries, clustering for training specialized DNNs, and archival services. In all three cases, Video-zilla reduces the time complexity of inter-camera video analytics from linear with the number of cameras to sublinear, and reduces query resource usage by up to 14× compared to using frame-level or spatial-temporal similarity built into existing query systems.\",\"PeriodicalId\":410404,\"journal\":{\"name\":\"Proceedings of the 2022 International Conference on Management of Data\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 International Conference on Management of Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3514221.3517840\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3514221.3517840","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

如今，监控摄像机的普遍部署对在许多摄像机馈送上运行的视频分析系统提出了巨大的可扩展性挑战。目前，除了标准文件系统所提供的之外，很少有索引工具来组织视频提要。最近的视频分析系统实现了特定应用的帧分析和采样技术，以减少处理的原始视频数量，利用帧级冗余或手动标记相机之间的时空相关性。video -zilla是视频查询系统和视频存储系统之间的独立索引层，用于组织视频数据。我们提出了一种基于视频中物体之间距离概念的视频数据单元抽象——语义视频流(SVS)。SVS隐式捕捉场景，这是当前视频内容特征和单个帧与整个摄像机馈电之间的中间地带所缺少的。然后，我们构建一个分层索引，暴露摄像机提要内部和跨摄像机提要的语义相似性，这样video -zilla就可以根据内容语义快速聚类视频提要，而无需手动标记。我们在三个用例中实现和评估Video-zilla:对象识别查询、训练专门dnn的聚类和存档服务。在这三种情况下，video -zilla将摄像机间视频分析的时间复杂度从随摄像机数量的线性降低到次线性，并且与使用现有查询系统中内置的帧级或时空相似性相比，将查询资源的使用减少了高达14倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Video-zilla: An Indexing Layer for Large-Scale Video Analytics

Pervasive deployment of surveillance cameras today poses enormous scalability challenges to video analytics systems operating over many camera feeds. Currently, there are few indexing tools to organize video feeds beyond what is provided by a standard file system. Recent video analytic systems implement application-specific frame profiling and sampling techniques to reduce the number of raw videos processed, leveraging frame-level redundancy or manually labeled spatial-temporal correlation between cameras. This paper presents Video-zilla, a standalone indexing layer between video query systems and a video store to organize video data. We propose a video data unit abstraction, semantic video stream (SVS), based on a notion of distance between objects in the video. SVS implicitly captures scenes, which is missing from current video content characterization and a middle ground between individual frames and an entire camera feed. We then build a hierarchical index that exposes the semantic similarity both within and across camera feeds, such that Video-zilla can quickly cluster video feeds based on their content semantics without manual labeling. We implement and evaluate Video-zilla in three use cases: object identification queries, clustering for training specialized DNNs, and archival services. In all three cases, Video-zilla reduces the time complexity of inter-camera video analytics from linear with the number of cameras to sublinear, and reduces query resource usage by up to 14× compared to using frame-level or spatial-temporal similarity built into existing query systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2022 International Conference on Management of Data

自引率

0.00%

发文量