{"title":"Video-zilla:用于大规模视频分析的索引层","authors":"Bo Hu, Peizhen Guo, Wenjun Hu","doi":"10.1145/3514221.3517840","DOIUrl":null,"url":null,"abstract":"Pervasive deployment of surveillance cameras today poses enormous scalability challenges to video analytics systems operating over many camera feeds. Currently, there are few indexing tools to organize video feeds beyond what is provided by a standard file system. Recent video analytic systems implement application-specific frame profiling and sampling techniques to reduce the number of raw videos processed, leveraging frame-level redundancy or manually labeled spatial-temporal correlation between cameras. This paper presents Video-zilla, a standalone indexing layer between video query systems and a video store to organize video data. We propose a video data unit abstraction, semantic video stream (SVS), based on a notion of distance between objects in the video. SVS implicitly captures scenes, which is missing from current video content characterization and a middle ground between individual frames and an entire camera feed. We then build a hierarchical index that exposes the semantic similarity both within and across camera feeds, such that Video-zilla can quickly cluster video feeds based on their content semantics without manual labeling. We implement and evaluate Video-zilla in three use cases: object identification queries, clustering for training specialized DNNs, and archival services. In all three cases, Video-zilla reduces the time complexity of inter-camera video analytics from linear with the number of cameras to sublinear, and reduces query resource usage by up to 14× compared to using frame-level or spatial-temporal similarity built into existing query systems.","PeriodicalId":410404,"journal":{"name":"Proceedings of the 2022 International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Video-zilla: An Indexing Layer for Large-Scale Video Analytics\",\"authors\":\"Bo Hu, Peizhen Guo, Wenjun Hu\",\"doi\":\"10.1145/3514221.3517840\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pervasive deployment of surveillance cameras today poses enormous scalability challenges to video analytics systems operating over many camera feeds. Currently, there are few indexing tools to organize video feeds beyond what is provided by a standard file system. Recent video analytic systems implement application-specific frame profiling and sampling techniques to reduce the number of raw videos processed, leveraging frame-level redundancy or manually labeled spatial-temporal correlation between cameras. This paper presents Video-zilla, a standalone indexing layer between video query systems and a video store to organize video data. We propose a video data unit abstraction, semantic video stream (SVS), based on a notion of distance between objects in the video. SVS implicitly captures scenes, which is missing from current video content characterization and a middle ground between individual frames and an entire camera feed. We then build a hierarchical index that exposes the semantic similarity both within and across camera feeds, such that Video-zilla can quickly cluster video feeds based on their content semantics without manual labeling. We implement and evaluate Video-zilla in three use cases: object identification queries, clustering for training specialized DNNs, and archival services. In all three cases, Video-zilla reduces the time complexity of inter-camera video analytics from linear with the number of cameras to sublinear, and reduces query resource usage by up to 14× compared to using frame-level or spatial-temporal similarity built into existing query systems.\",\"PeriodicalId\":410404,\"journal\":{\"name\":\"Proceedings of the 2022 International Conference on Management of Data\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 International Conference on Management of Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3514221.3517840\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3514221.3517840","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Video-zilla: An Indexing Layer for Large-Scale Video Analytics
Pervasive deployment of surveillance cameras today poses enormous scalability challenges to video analytics systems operating over many camera feeds. Currently, there are few indexing tools to organize video feeds beyond what is provided by a standard file system. Recent video analytic systems implement application-specific frame profiling and sampling techniques to reduce the number of raw videos processed, leveraging frame-level redundancy or manually labeled spatial-temporal correlation between cameras. This paper presents Video-zilla, a standalone indexing layer between video query systems and a video store to organize video data. We propose a video data unit abstraction, semantic video stream (SVS), based on a notion of distance between objects in the video. SVS implicitly captures scenes, which is missing from current video content characterization and a middle ground between individual frames and an entire camera feed. We then build a hierarchical index that exposes the semantic similarity both within and across camera feeds, such that Video-zilla can quickly cluster video feeds based on their content semantics without manual labeling. We implement and evaluate Video-zilla in three use cases: object identification queries, clustering for training specialized DNNs, and archival services. In all three cases, Video-zilla reduces the time complexity of inter-camera video analytics from linear with the number of cameras to sublinear, and reduces query resource usage by up to 14× compared to using frame-level or spatial-temporal similarity built into existing query systems.