DualRT: A Qos-Aware Soft Real-Time Video Analytics Framework for Dual-Stage GPU-CPU Tasks on Edge

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Concurrency and Computation-Practice & Experience Pub Date : 2025-07-03 DOI:10.1002/cpe.70174

Changhong Zhu, Haitao Zhang, Xingtao Xu

{"title":"DualRT: A Qos-Aware Soft Real-Time Video Analytics Framework for Dual-Stage GPU-CPU Tasks on Edge","authors":"Changhong Zhu, Haitao Zhang, Xingtao Xu","doi":"10.1002/cpe.70174","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Edge cameras are ubiquitous, together with the recent boom in computer vision technology, and a variety of video analytics tasks are being processed at the edge. It is challenging to support more complex video analytics tasks on edge servers with unpredictable request loads and limited resources. However, most of these works use only a single optimization approach, focusing only on the improvement of a certain performance metric in a single processing stage, ignoring the balance of other performance metrics, and the space available for optimization is often very limited. Especially when dealing with video analytics tasks that need to be divided into two GPU-CPU stages for completion, this unidirectional focus may lead to execution performance imbalance or even negative quality of service (QoS) optimization. In addition, to fully utilize the valuable resources on the edge servers, it is often necessary to schedule multiple types of video analytics tasks on the edge servers. However, most of the existing scheduling strategies only focus on how to allocate computational resources for end-to-end tasks. They lack the awareness and consideration of the execution of tasks in different execution stages, as well as the mutual interference among tasks. These scheduling strategies, lacking stage-sensitivity and interference-sensitivity, may cause performance conflicts in environments running multiple tasks involving GPU-CPU dual-stage processing, thus affecting the overall QoS. To address these challenges, we first evaluate the impact of batch processing, frame rate control, resolution selection, and CPU concurrency processing on throughput, latency, and accuracy when running dual-stage tasks on edge platforms. Then, we propose DualRT, a soft real-time video analytics framework for dual-stage tasks, to optimize the QoS of dual-stage tasks while avoiding request stacking on edge platforms. In the scheduling module of DualRT, we design a scheduling method using a multi-agent deep reinforcement learning algorithm and a variable time window approach to schedule multiple dual-stage tasks with joint control of batch size, resolution, frame rate, and CPU concurrency for each task. Our experimental results show that DualRT improves QoS by an average of 13.3% and maximum throughput by an average of 24.6% compared to state-of-the-art solutions.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 18-20","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70174","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Edge cameras are ubiquitous, together with the recent boom in computer vision technology, and a variety of video analytics tasks are being processed at the edge. It is challenging to support more complex video analytics tasks on edge servers with unpredictable request loads and limited resources. However, most of these works use only a single optimization approach, focusing only on the improvement of a certain performance metric in a single processing stage, ignoring the balance of other performance metrics, and the space available for optimization is often very limited. Especially when dealing with video analytics tasks that need to be divided into two GPU-CPU stages for completion, this unidirectional focus may lead to execution performance imbalance or even negative quality of service (QoS) optimization. In addition, to fully utilize the valuable resources on the edge servers, it is often necessary to schedule multiple types of video analytics tasks on the edge servers. However, most of the existing scheduling strategies only focus on how to allocate computational resources for end-to-end tasks. They lack the awareness and consideration of the execution of tasks in different execution stages, as well as the mutual interference among tasks. These scheduling strategies, lacking stage-sensitivity and interference-sensitivity, may cause performance conflicts in environments running multiple tasks involving GPU-CPU dual-stage processing, thus affecting the overall QoS. To address these challenges, we first evaluate the impact of batch processing, frame rate control, resolution selection, and CPU concurrency processing on throughput, latency, and accuracy when running dual-stage tasks on edge platforms. Then, we propose DualRT, a soft real-time video analytics framework for dual-stage tasks, to optimize the QoS of dual-stage tasks while avoiding request stacking on edge platforms. In the scheduling module of DualRT, we design a scheduling method using a multi-agent deep reinforcement learning algorithm and a variable time window approach to schedule multiple dual-stage tasks with joint control of batch size, resolution, frame rate, and CPU concurrency for each task. Our experimental results show that DualRT improves QoS by an average of 13.3% and maximum throughput by an average of 24.6% compared to state-of-the-art solutions.

查看原文本刊更多论文

DualRT：一种面向边缘双阶段GPU-CPU任务的qos感知软实时视频分析框架

边缘摄像头无处不在，加上最近计算机视觉技术的蓬勃发展，各种视频分析任务正在边缘处理。在具有不可预测的请求负载和有限资源的边缘服务器上支持更复杂的视频分析任务是具有挑战性的。然而，这些工作大多只使用单一的优化方法，只关注单个处理阶段中某个性能指标的改进，而忽略了其他性能指标的平衡，可用于优化的空间往往非常有限。特别是在处理需要分为两个GPU-CPU阶段完成的视频分析任务时，这种单向关注可能导致执行性能不平衡甚至负面的服务质量（QoS）优化。此外，为了充分利用边缘服务器上的宝贵资源，通常需要在边缘服务器上调度多种类型的视频分析任务。然而，现有的调度策略大多只关注如何为端到端任务分配计算资源。缺乏对不同执行阶段任务执行的认识和考虑，以及任务之间的相互干扰。这些调度策略缺乏阶段敏感性和干扰敏感性，在GPU-CPU双阶段处理的多任务环境下可能会导致性能冲突，从而影响整体QoS。为了应对这些挑战，我们首先评估了批处理、帧速率控制、分辨率选择和CPU并发处理在边缘平台上运行双阶段任务时对吞吐量、延迟和准确性的影响。然后，我们提出了双阶段任务软实时视频分析框架DualRT，以优化双阶段任务的QoS，同时避免边缘平台上的请求堆叠。在DualRT的调度模块中，我们设计了一种使用多智能体深度强化学习算法和变时间窗方法来调度多个双阶段任务的调度方法，并对每个任务的批处理大小、分辨率、帧率和CPU并发性进行联合控制。我们的实验结果表明，与最先进的解决方案相比，DualRT将QoS平均提高了13.3%，最大吞吐量平均提高了24.6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Concurrency and Computation-Practice & Experience 工程技术-计算机：理论方法

CiteScore

5.00

自引率

10.00%

发文量

664

审稿时长

9.6 months

期刊介绍： Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of: Parallel and distributed computing; High-performance computing; Computational and data science; Artificial intelligence and machine learning; Big data applications, algorithms, and systems; Network science; Ontologies and semantics; Security and privacy; Cloud/edge/fog computing; Green computing; and Quantum computing.