Re-CATA: Real-Time and Flexible Accelerator Design Framework for On-Device Codec Avatars

IF 2.7 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Yongan Zhang;Yuecheng Li;Syed Shakib Sarwar;H. Ekin Sumbul;Yonggan Fu;Haoran You;Cheng Wan;Yingyan Lin
{"title":"Re-CATA: Real-Time and Flexible Accelerator Design Framework for On-Device Codec Avatars","authors":"Yongan Zhang;Yuecheng Li;Syed Shakib Sarwar;H. Ekin Sumbul;Yonggan Fu;Haoran You;Cheng Wan;Yingyan Lin","doi":"10.1109/TCAD.2025.3539600","DOIUrl":null,"url":null,"abstract":"Real-time Codec Avatars, which employ deep generative models for 3-D reconstruction of human features, are crucial for immersive telepresence in augmented reality and virtual reality (AR/VR) environments. However, deploying these avatars in real-time on AR/VR headsets is challenging due to the inability of existing devices to achieve satisfying performance within stringent hardware resource constraints. To address these challenges, we introduce Re-CATA, an innovative full-stack and flexible Codec Avatar accelerator design framework. Re-CATA is designed to deliver real-time throughput (greater than 120 FPS) for the complete Codec Avatar processing pipeline within an edge-level power budget of 5 W under FPGA prototyping. Our approach begins by abstracting the operation mapping and scheduling challenges inherent in Codec Avatars, which require both centralized and distributed processing to handle dynamically changing workloads. We propose a novel hardware resource and workload partitioning scheme optimized for these fluctuating demands. To complement this, we introduce an agile runtime scheduling system for efficient workload reallocation among computing units as needed, recognizing the limitations of static partitioning in rapidly evolving workload scenarios. Furthermore, our micro-architecture design incorporates unified computing modules and efficient hardware peripherals, enabling seamless workload balancing across the Codec Avatar processing pipeline. We evaluate the Re-CATA accelerators via on-board FPGA prototyping, comparing them to various baselines, including commercial AR/VR system-on-chips and academic accelerators. This evaluation demonstrates a maximum speedup of up to <inline-formula> <tex-math>$5.95\\times $ </tex-math></inline-formula> under similar settings.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"3020-3033"},"PeriodicalIF":2.7000,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10876391/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Real-time Codec Avatars, which employ deep generative models for 3-D reconstruction of human features, are crucial for immersive telepresence in augmented reality and virtual reality (AR/VR) environments. However, deploying these avatars in real-time on AR/VR headsets is challenging due to the inability of existing devices to achieve satisfying performance within stringent hardware resource constraints. To address these challenges, we introduce Re-CATA, an innovative full-stack and flexible Codec Avatar accelerator design framework. Re-CATA is designed to deliver real-time throughput (greater than 120 FPS) for the complete Codec Avatar processing pipeline within an edge-level power budget of 5 W under FPGA prototyping. Our approach begins by abstracting the operation mapping and scheduling challenges inherent in Codec Avatars, which require both centralized and distributed processing to handle dynamically changing workloads. We propose a novel hardware resource and workload partitioning scheme optimized for these fluctuating demands. To complement this, we introduce an agile runtime scheduling system for efficient workload reallocation among computing units as needed, recognizing the limitations of static partitioning in rapidly evolving workload scenarios. Furthermore, our micro-architecture design incorporates unified computing modules and efficient hardware peripherals, enabling seamless workload balancing across the Codec Avatar processing pipeline. We evaluate the Re-CATA accelerators via on-board FPGA prototyping, comparing them to various baselines, including commercial AR/VR system-on-chips and academic accelerators. This evaluation demonstrates a maximum speedup of up to $5.95\times $ under similar settings.
Re-CATA:设备上编解码器头像的实时灵活加速器设计框架
实时编解码器化身采用深度生成模型对人体特征进行三维重建,对于增强现实和虚拟现实(AR/VR)环境中的沉浸式远程呈现至关重要。然而,由于现有设备无法在严格的硬件资源限制下实现令人满意的性能,因此在AR/VR头显上实时部署这些虚拟形象具有挑战性。为了应对这些挑战,我们推出了Re-CATA,一个创新的全栈和灵活的Codec Avatar加速器设计框架。Re-CATA旨在为FPGA原型下的完整Codec Avatar处理管道提供实时吞吐量(大于120 FPS),边缘级功率预算为5 W。我们的方法首先抽象了Codec avatar中固有的操作映射和调度挑战,这需要集中式和分布式处理来处理动态变化的工作负载。我们提出了一种新的硬件资源和工作负载分区方案,针对这些波动的需求进行了优化。为了补充这一点,我们引入了一个灵活的运行时调度系统,以便根据需要在计算单元之间有效地重新分配工作负载,同时认识到静态分区在快速发展的工作负载场景中的局限性。此外,我们的微架构设计结合了统一的计算模块和高效的硬件外设,实现了跨Codec Avatar处理管道的无缝工作负载平衡。我们通过板载FPGA原型对Re-CATA加速器进行了评估,并将其与各种基准进行了比较,包括商用AR/VR片上系统和学术加速器。该评估显示,在类似设置下,最大加速高达5.95倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.60
自引率
13.80%
发文量
500
审稿时长
7 months
期刊介绍: The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信