Self-Enhancing Video Data Management System for Compositional Events with Large Language Models [Technical Report]

arXiv - CS - Databases Pub Date : 2024-08-05 DOI:arxiv-2408.02243

Enhao Zhang, Nicole Sullivan, Brandon Haynes, Ranjay Krishna, Magdalena Balazinska

{"title":"Self-Enhancing Video Data Management System for Compositional Events with Large Language Models [Technical Report]","authors":"Enhao Zhang, Nicole Sullivan, Brandon Haynes, Ranjay Krishna, Magdalena Balazinska","doi":"arxiv-2408.02243","DOIUrl":null,"url":null,"abstract":"Complex video queries can be answered by decomposing them into modular\nsubtasks. However, existing video data management systems assume the existence\nof predefined modules for each subtask. We introduce VOCAL-UDF, a novel\nself-enhancing system that supports compositional queries over videos without\nthe need for predefined modules. VOCAL-UDF automatically identifies and\nconstructs missing modules and encapsulates them as user-defined functions\n(UDFs), thus expanding its querying capabilities. To achieve this, we formulate\na unified UDF model that leverages large language models (LLMs) to aid in new\nUDF generation. VOCAL-UDF handles a wide range of concepts by supporting both\nprogram-based UDFs (i.e., Python functions generated by LLMs) and\ndistilled-model UDFs (lightweight vision models distilled from strong\npretrained models). To resolve the inherent ambiguity in user intent, VOCAL-UDF\ngenerates multiple candidate UDFs and uses active learning to efficiently\nselect the best one. With the self-enhancing capability, VOCAL-UDF\nsignificantly improves query performance across three video datasets.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"92 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.02243","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Complex video queries can be answered by decomposing them into modular subtasks. However, existing video data management systems assume the existence of predefined modules for each subtask. We introduce VOCAL-UDF, a novel self-enhancing system that supports compositional queries over videos without the need for predefined modules. VOCAL-UDF automatically identifies and constructs missing modules and encapsulates them as user-defined functions (UDFs), thus expanding its querying capabilities. To achieve this, we formulate a unified UDF model that leverages large language models (LLMs) to aid in new UDF generation. VOCAL-UDF handles a wide range of concepts by supporting both program-based UDFs (i.e., Python functions generated by LLMs) and distilled-model UDFs (lightweight vision models distilled from strong pretrained models). To resolve the inherent ambiguity in user intent, VOCAL-UDF generates multiple candidate UDFs and uses active learning to efficiently select the best one. With the self-enhancing capability, VOCAL-UDF significantly improves query performance across three video datasets.

查看原文本刊更多论文

采用大型语言模型的合成事件自增强视频数据管理系统 [技术报告］

复杂的视频查询可通过将其分解为模块化子任务来回答。然而，现有的视频数据管理系统假设每个子任务都存在预定义的模块。我们介绍了 VOCAL-UDF，它是一种新颖的自我增强系统，无需预定义模块即可支持视频组合查询。VOCAL-UDF 可自动识别和构建缺失的模块，并将其封装为用户自定义函数（UDF），从而扩展其查询功能。为此，我们建立了一个统一的 UDF 模型，利用大型语言模型（LLM）来帮助生成新的 UDF。VOCAL-UDF 支持基于程序的 UDF（即由 LLM 生成的 Python 函数）和经蒸馏的模型 UDF（从强预处理模型中蒸馏出的轻量级视觉模型），可以处理各种概念。为了解决用户意图中固有的模糊性，VOCAL-UDF 生成多个候选 UDF，并利用主动学习有效地选择最佳 UDF。凭借自我增强能力，VOCAL-UDF 显著提高了三个视频数据集的查询性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Databases

自引率

0.00%

发文量