Extract-Transform-Load for Video Streams

Proc. VLDB Endow. Pub Date : 2023-05-01 DOI:10.14778/3598581.3598600

Ferdinand Kossmann, Ziniu Wu, Eugenie Lai, Nesime Tatbul, Lei Cao, Tim Kraska, S. Madden

{"title":"Extract-Transform-Load for Video Streams","authors":"Ferdinand Kossmann, Ziniu Wu, Eugenie Lai, Nesime Tatbul, Lei Cao, Tim Kraska, S. Madden","doi":"10.14778/3598581.3598600","DOIUrl":null,"url":null,"abstract":"\n Social media, self-driving cars, and traffic cameras produce video streams at large scales and cheap cost. However, storing and querying video at such scales is prohibitively expensive. We propose to treat large-scale video analytics as a data warehousing problem: Video is a format that is easy to produce but needs to be transformed into an application-specific format that is easy to query. Analogously, we define the problem of Video Extract-Transform-Load (\n V-ETL\n ).\n V-ETL\n systems need to reduce the cost of running a user-defined\n V-ETL\n job while also giving throughput guarantees to keep up with the rate at which data is produced. We find that no current system sufficiently fulfills both needs and therefore propose\n Skyscraper\n , a system tailored to\n V-ETL. Skyscraper\n can execute arbitrary video ingestion pipelines and adaptively tunes them to reduce cost at minimal or no quality degradation, e.g., by adjusting sampling rates and resolutions to the ingested content.\n Skyscraper\n can hereby be provisioned with cheap on-premises compute and uses a combination of buffering and cloud bursting to deal with peaks in workload caused by expensive processing configurations. In our experiments, we find that\n Skyscraper\n significantly reduces the cost of\n V-ETL\n ingestion compared to adaptions of current SOTA systems, while at the same time giving robustness guarantees that these systems are lacking.\n","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"1 1","pages":"2302-2315"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proc. VLDB Endow.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14778/3598581.3598600","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Social media, self-driving cars, and traffic cameras produce video streams at large scales and cheap cost. However, storing and querying video at such scales is prohibitively expensive. We propose to treat large-scale video analytics as a data warehousing problem: Video is a format that is easy to produce but needs to be transformed into an application-specific format that is easy to query. Analogously, we define the problem of Video Extract-Transform-Load ( V-ETL ). V-ETL systems need to reduce the cost of running a user-defined V-ETL job while also giving throughput guarantees to keep up with the rate at which data is produced. We find that no current system sufficiently fulfills both needs and therefore propose Skyscraper , a system tailored to V-ETL. Skyscraper can execute arbitrary video ingestion pipelines and adaptively tunes them to reduce cost at minimal or no quality degradation, e.g., by adjusting sampling rates and resolutions to the ingested content. Skyscraper can hereby be provisioned with cheap on-premises compute and uses a combination of buffering and cloud bursting to deal with peaks in workload caused by expensive processing configurations. In our experiments, we find that Skyscraper significantly reduces the cost of V-ETL ingestion compared to adaptions of current SOTA systems, while at the same time giving robustness guarantees that these systems are lacking.

查看原文本刊更多论文

提取-转换-加载视频流

社交媒体、自动驾驶汽车和交通摄像头可以大规模、低成本地生产视频流。然而，如此大规模的存储和查询视频是非常昂贵的。我们建议将大规模视频分析视为数据仓库问题:视频是一种易于生成的格式，但需要转换为易于查询的特定于应用程序的格式。类似地，我们定义了视频提取-转换-加载(V-ETL)问题。V-ETL系统需要降低运行用户定义的V-ETL作业的成本，同时还要提供吞吐量保证，以跟上数据生成的速度。我们发现目前没有系统能够充分满足这两种需求，因此提出了摩天大楼，这是一个为V-ETL量身定制的系统。摩天大楼可以执行任意的视频摄取管道，并自适应地调整它们，以最小化或没有质量下降来降低成本，例如，通过调整摄取内容的采样率和分辨率。因此，摩天楼可以配备廉价的本地计算，并结合使用缓冲和云爆发来处理由昂贵的处理配置引起的工作负载高峰。在我们的实验中，我们发现与当前SOTA系统的适应相比，Skyscraper显著降低了V-ETL摄取的成本，同时提供了这些系统所缺乏的鲁棒性保证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proc. VLDB Endow.

自引率

0.00%

发文量