Llama:用于自动调整视频分析管道的异构和无服务器框架

Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference) Pub Date : 2021-02-03 DOI:10.1145/3472883.3486972

Francisco Romero, Mark Zhao, N. Yadwadkar, C. Kozyrakis

{"title":"Llama:用于自动调整视频分析管道的异构和无服务器框架","authors":"Francisco Romero, Mark Zhao, N. Yadwadkar, C. Kozyrakis","doi":"10.1145/3472883.3486972","DOIUrl":null,"url":null,"abstract":"The proliferation of camera-enabled devices and large video repositories has led to a diverse set of video analytics applications. These applications rely on video pipelines, represented as DAGs of operations, to transform videos, process extracted metadata, and answer questions like, \"Is this intersection congested?\" The latency and resource efficiency of pipelines can be optimized using configurable knobs for each operation (e.g., sampling rate, batch size, or type of hardware used). However, determining efficient configurations is challenging because (a) the configuration search space is exponentially large, and (b) the optimal configuration depends on users' desired latency and cost targets, (c) input video contents may exercise different paths in the DAG and produce a variable amount intermediate results. Existing video analytics and processing systems leave it to the users to manually configure operations and select hardware resources. We present Llama: a heterogeneous and serverless framework for auto-tuning video pipelines. Given an end-to-end latency target, Llama optimizes for cost efficiency by (a) calculating a latency target for each operation invocation, and (b) dynamically running a cost-based optimizer to assign configurations across heterogeneous hardware that best meet the calculated per-invocation latency target. This makes the problem of auto-tuning large video pipelines tractable and allows us to handle input-dependent behavior, conditional branches in the DAG, and execution variability. We describe the algorithms in Llama and evaluate it on a cloud platform using serverless CPU and GPU resources. We show that compared to state-of-the-art cluster and serverless video analytics and processing systems, Llama achieves 7.8x lower latency and 16x cost reduction on average.","PeriodicalId":91949,"journal":{"name":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","volume":"113 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"50","resultStr":"{\"title\":\"Llama: A Heterogeneous & Serverless Framework for Auto-Tuning Video Analytics Pipelines\",\"authors\":\"Francisco Romero, Mark Zhao, N. Yadwadkar, C. Kozyrakis\",\"doi\":\"10.1145/3472883.3486972\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The proliferation of camera-enabled devices and large video repositories has led to a diverse set of video analytics applications. These applications rely on video pipelines, represented as DAGs of operations, to transform videos, process extracted metadata, and answer questions like, \\\"Is this intersection congested?\\\" The latency and resource efficiency of pipelines can be optimized using configurable knobs for each operation (e.g., sampling rate, batch size, or type of hardware used). However, determining efficient configurations is challenging because (a) the configuration search space is exponentially large, and (b) the optimal configuration depends on users' desired latency and cost targets, (c) input video contents may exercise different paths in the DAG and produce a variable amount intermediate results. Existing video analytics and processing systems leave it to the users to manually configure operations and select hardware resources. We present Llama: a heterogeneous and serverless framework for auto-tuning video pipelines. Given an end-to-end latency target, Llama optimizes for cost efficiency by (a) calculating a latency target for each operation invocation, and (b) dynamically running a cost-based optimizer to assign configurations across heterogeneous hardware that best meet the calculated per-invocation latency target. This makes the problem of auto-tuning large video pipelines tractable and allows us to handle input-dependent behavior, conditional branches in the DAG, and execution variability. We describe the algorithms in Llama and evaluate it on a cloud platform using serverless CPU and GPU resources. We show that compared to state-of-the-art cluster and serverless video analytics and processing systems, Llama achieves 7.8x lower latency and 16x cost reduction on average.\",\"PeriodicalId\":91949,\"journal\":{\"name\":\"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)\",\"volume\":\"113 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"50\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3472883.3486972\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3472883.3486972","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 50

摘要

支持摄像头的设备和大型视频存储库的激增导致了各种视频分析应用程序的出现。这些应用程序依赖于视频管道，表示为操作的dag，来转换视频，处理提取的元数据，并回答诸如“这个十字路口拥挤吗?”管道的延迟和资源效率可以使用可配置的旋钮来优化每个操作(例如，采样率，批处理大小或使用的硬件类型)。然而，确定有效的配置是具有挑战性的，因为(a)配置搜索空间是指数级的，(b)最优配置取决于用户期望的延迟和成本目标，(c)输入的视频内容可能在DAG中运行不同的路径，并产生不同数量的中间结果。现有的视频分析和处理系统让用户手动配置操作和选择硬件资源。我们提出了Llama:一个异构和无服务器的框架，用于自动调整视频管道。给定端到端延迟目标，Llama通过(a)为每个操作调用计算延迟目标，以及(b)动态运行基于成本的优化器来跨异构硬件分配最能满足计算出的每次调用延迟目标的配置，从而优化成本效率。这使得自动调整大型视频管道的问题变得容易处理，并允许我们处理依赖于输入的行为、DAG中的条件分支和执行可变性。我们描述了Llama中的算法，并在使用无服务器CPU和GPU资源的云平台上对其进行了评估。我们表明，与最先进的集群和无服务器视频分析和处理系统相比，Llama实现了7.8倍的低延迟和16倍的平均成本降低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Llama: A Heterogeneous & Serverless Framework for Auto-Tuning Video Analytics Pipelines

The proliferation of camera-enabled devices and large video repositories has led to a diverse set of video analytics applications. These applications rely on video pipelines, represented as DAGs of operations, to transform videos, process extracted metadata, and answer questions like, "Is this intersection congested?" The latency and resource efficiency of pipelines can be optimized using configurable knobs for each operation (e.g., sampling rate, batch size, or type of hardware used). However, determining efficient configurations is challenging because (a) the configuration search space is exponentially large, and (b) the optimal configuration depends on users' desired latency and cost targets, (c) input video contents may exercise different paths in the DAG and produce a variable amount intermediate results. Existing video analytics and processing systems leave it to the users to manually configure operations and select hardware resources. We present Llama: a heterogeneous and serverless framework for auto-tuning video pipelines. Given an end-to-end latency target, Llama optimizes for cost efficiency by (a) calculating a latency target for each operation invocation, and (b) dynamically running a cost-based optimizer to assign configurations across heterogeneous hardware that best meet the calculated per-invocation latency target. This makes the problem of auto-tuning large video pipelines tractable and allows us to handle input-dependent behavior, conditional branches in the DAG, and execution variability. We describe the algorithms in Llama and evaluate it on a cloud platform using serverless CPU and GPU resources. We show that compared to state-of-the-art cluster and serverless video analytics and processing systems, Llama achieves 7.8x lower latency and 16x cost reduction on average.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)

自引率

0.00%

发文量