Superscalar Time-Triggered Versatile-Tensor Accelerator

IF 2.9 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2025-01-10 DOI:10.1109/TCAD.2025.3528355

Yosab Bebawy;Aniebiet Micheal Ezekiel;Roman Obermaisser

{"title":"Superscalar Time-Triggered Versatile-Tensor Accelerator","authors":"Yosab Bebawy;Aniebiet Micheal Ezekiel;Roman Obermaisser","doi":"10.1109/TCAD.2025.3528355","DOIUrl":null,"url":null,"abstract":"Integrating AI hardware accelerators into safety-critical real-time systems to speed up the inference execution of safety-critical AI applications demands rigorous assurance to prevent potentially catastrophic outcomes, especially in environments where timely and accurate results are crucial. Even in cases where AI models are potentially designed and constructed correctly using AI frameworks, the system’s safety will also rely on the real-time behavior of the AI hardware accelerator. While AI hardware accelerators can achieve the necessary throughput, conventional accelerators, such as the versatile tensor accelerator (VTA) encounter significant challenges in predictability and reliability. These challenges stem from the variability in event-driven inference execution and insufficient timing control, posing considerable risks in safety-critical scenarios where delays in providing inference results can have severe consequences. To address this challenge, previous work introduced the time-triggered VTA (TT-VTA) to ensure timely execution of tensor operations. Nonetheless, the TT-VTA exhibited a slightly longer average inference time of 53 ms compared to the conventional VTA’s 51 ms, underscoring the ongoing need for optimization in this crucial domain to speed up the inference execution, while sustaining the deterministic and predictable behavior of the TT-VTA. This article proposes a novel superscalar TT-VTA (STT-VTA) architecture specifically designed to address the deficiencies of conventional VTAs and TT-VTAs. The STT-VTA architecture employs pattern-based timing schedules generated by an extended software simulator and an architecture configuration manager to analyze tensor operations within a given AI model and determine the required number of additional VTA modules for faster inference than a single (TT-)VTA setup. It integrates DRAMSim2 for memory instructions and a cycle-accurate simulator for nonmemory instructions. Evaluation using various models demonstrates that the STT-VTA achieves identical classification accuracy as the conventional VTA and TT-VTA, while improving performance and reducing inference time by 20%–41%. Moreover, it ensures deterministic temporal use of shared resources, such as memories and memory-buses and precise timing control to avoid interference. These results contribute toward safety and reliability of AI systems deployed in a safety-critical environment.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 7","pages":"2503-2515"},"PeriodicalIF":2.9000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10836726/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Integrating AI hardware accelerators into safety-critical real-time systems to speed up the inference execution of safety-critical AI applications demands rigorous assurance to prevent potentially catastrophic outcomes, especially in environments where timely and accurate results are crucial. Even in cases where AI models are potentially designed and constructed correctly using AI frameworks, the system’s safety will also rely on the real-time behavior of the AI hardware accelerator. While AI hardware accelerators can achieve the necessary throughput, conventional accelerators, such as the versatile tensor accelerator (VTA) encounter significant challenges in predictability and reliability. These challenges stem from the variability in event-driven inference execution and insufficient timing control, posing considerable risks in safety-critical scenarios where delays in providing inference results can have severe consequences. To address this challenge, previous work introduced the time-triggered VTA (TT-VTA) to ensure timely execution of tensor operations. Nonetheless, the TT-VTA exhibited a slightly longer average inference time of 53 ms compared to the conventional VTA’s 51 ms, underscoring the ongoing need for optimization in this crucial domain to speed up the inference execution, while sustaining the deterministic and predictable behavior of the TT-VTA. This article proposes a novel superscalar TT-VTA (STT-VTA) architecture specifically designed to address the deficiencies of conventional VTAs and TT-VTAs. The STT-VTA architecture employs pattern-based timing schedules generated by an extended software simulator and an architecture configuration manager to analyze tensor operations within a given AI model and determine the required number of additional VTA modules for faster inference than a single (TT-)VTA setup. It integrates DRAMSim2 for memory instructions and a cycle-accurate simulator for nonmemory instructions. Evaluation using various models demonstrates that the STT-VTA achieves identical classification accuracy as the conventional VTA and TT-VTA, while improving performance and reducing inference time by 20%–41%. Moreover, it ensures deterministic temporal use of shared resources, such as memories and memory-buses and precise timing control to avoid interference. These results contribute toward safety and reliability of AI systems deployed in a safety-critical environment.

查看原文本刊更多论文

超标量时间触发万能张量加速器

将人工智能硬件加速器集成到安全关键型实时系统中，以加快安全关键型人工智能应用程序的推理执行，需要严格的保证，以防止潜在的灾难性后果，特别是在及时和准确结果至关重要的环境中。即使在使用人工智能框架正确设计和构建人工智能模型的情况下，系统的安全性也将依赖于人工智能硬件加速器的实时行为。虽然人工智能硬件加速器可以实现必要的吞吐量，但通用张量加速器（VTA）等传统加速器在可预测性和可靠性方面面临重大挑战。这些挑战源于事件驱动的推理执行的可变性和时间控制的不足，在提供推理结果的延迟可能产生严重后果的安全关键场景中构成相当大的风险。为了应对这一挑战，之前的工作引入了时间触发VTA (TT-VTA)，以确保张量操作的及时执行。尽管如此，与传统VTA的51毫秒相比，TT-VTA的平均推理时间略长，为53毫秒，这表明在保持TT-VTA的确定性和可预测行为的同时，还需要对这一关键领域进行优化，以加快推理执行。本文提出了一种新的超标量TT-VTA （STT-VTA）架构，专门用于解决传统VTAs和TT-VTAs的不足。STT-VTA架构采用由扩展软件模拟器和架构配置管理器生成的基于模式的时序计划来分析给定AI模型中的张量操作，并确定所需的额外VTA模块数量，以实现比单个（TT-）VTA设置更快的推理。它集成了用于内存指令的DRAMSim2和用于非内存指令的周期精确模拟器。使用各种模型进行评估表明，STT-VTA的分类精度与传统VTA和TT-VTA相同，同时性能提高，推理时间缩短20%-41%。此外，它确保了共享资源（如内存和内存总线）的确定性时间使用和精确的定时控制以避免干扰。这些结果有助于在安全关键环境中部署人工智能系统的安全性和可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 工程技术-工程：电子与电气

CiteScore

5.60

自引率

13.80%

发文量

500

审稿时长

7 months

期刊介绍： The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.