Superscalar Time-Triggered Versatile-Tensor Accelerator

IF 2.9 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Yosab Bebawy;Aniebiet Micheal Ezekiel;Roman Obermaisser
{"title":"Superscalar Time-Triggered Versatile-Tensor Accelerator","authors":"Yosab Bebawy;Aniebiet Micheal Ezekiel;Roman Obermaisser","doi":"10.1109/TCAD.2025.3528355","DOIUrl":null,"url":null,"abstract":"Integrating AI hardware accelerators into safety-critical real-time systems to speed up the inference execution of safety-critical AI applications demands rigorous assurance to prevent potentially catastrophic outcomes, especially in environments where timely and accurate results are crucial. Even in cases where AI models are potentially designed and constructed correctly using AI frameworks, the system’s safety will also rely on the real-time behavior of the AI hardware accelerator. While AI hardware accelerators can achieve the necessary throughput, conventional accelerators, such as the versatile tensor accelerator (VTA) encounter significant challenges in predictability and reliability. These challenges stem from the variability in event-driven inference execution and insufficient timing control, posing considerable risks in safety-critical scenarios where delays in providing inference results can have severe consequences. To address this challenge, previous work introduced the time-triggered VTA (TT-VTA) to ensure timely execution of tensor operations. Nonetheless, the TT-VTA exhibited a slightly longer average inference time of 53 ms compared to the conventional VTA’s 51 ms, underscoring the ongoing need for optimization in this crucial domain to speed up the inference execution, while sustaining the deterministic and predictable behavior of the TT-VTA. This article proposes a novel superscalar TT-VTA (STT-VTA) architecture specifically designed to address the deficiencies of conventional VTAs and TT-VTAs. The STT-VTA architecture employs pattern-based timing schedules generated by an extended software simulator and an architecture configuration manager to analyze tensor operations within a given AI model and determine the required number of additional VTA modules for faster inference than a single (TT-)VTA setup. It integrates DRAMSim2 for memory instructions and a cycle-accurate simulator for nonmemory instructions. Evaluation using various models demonstrates that the STT-VTA achieves identical classification accuracy as the conventional VTA and TT-VTA, while improving performance and reducing inference time by 20%–41%. Moreover, it ensures deterministic temporal use of shared resources, such as memories and memory-buses and precise timing control to avoid interference. These results contribute toward safety and reliability of AI systems deployed in a safety-critical environment.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 7","pages":"2503-2515"},"PeriodicalIF":2.9000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10836726/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Integrating AI hardware accelerators into safety-critical real-time systems to speed up the inference execution of safety-critical AI applications demands rigorous assurance to prevent potentially catastrophic outcomes, especially in environments where timely and accurate results are crucial. Even in cases where AI models are potentially designed and constructed correctly using AI frameworks, the system’s safety will also rely on the real-time behavior of the AI hardware accelerator. While AI hardware accelerators can achieve the necessary throughput, conventional accelerators, such as the versatile tensor accelerator (VTA) encounter significant challenges in predictability and reliability. These challenges stem from the variability in event-driven inference execution and insufficient timing control, posing considerable risks in safety-critical scenarios where delays in providing inference results can have severe consequences. To address this challenge, previous work introduced the time-triggered VTA (TT-VTA) to ensure timely execution of tensor operations. Nonetheless, the TT-VTA exhibited a slightly longer average inference time of 53 ms compared to the conventional VTA’s 51 ms, underscoring the ongoing need for optimization in this crucial domain to speed up the inference execution, while sustaining the deterministic and predictable behavior of the TT-VTA. This article proposes a novel superscalar TT-VTA (STT-VTA) architecture specifically designed to address the deficiencies of conventional VTAs and TT-VTAs. The STT-VTA architecture employs pattern-based timing schedules generated by an extended software simulator and an architecture configuration manager to analyze tensor operations within a given AI model and determine the required number of additional VTA modules for faster inference than a single (TT-)VTA setup. It integrates DRAMSim2 for memory instructions and a cycle-accurate simulator for nonmemory instructions. Evaluation using various models demonstrates that the STT-VTA achieves identical classification accuracy as the conventional VTA and TT-VTA, while improving performance and reducing inference time by 20%–41%. Moreover, it ensures deterministic temporal use of shared resources, such as memories and memory-buses and precise timing control to avoid interference. These results contribute toward safety and reliability of AI systems deployed in a safety-critical environment.
超标量时间触发万能张量加速器
将人工智能硬件加速器集成到安全关键型实时系统中,以加快安全关键型人工智能应用程序的推理执行,需要严格的保证,以防止潜在的灾难性后果,特别是在及时和准确结果至关重要的环境中。即使在使用人工智能框架正确设计和构建人工智能模型的情况下,系统的安全性也将依赖于人工智能硬件加速器的实时行为。虽然人工智能硬件加速器可以实现必要的吞吐量,但通用张量加速器(VTA)等传统加速器在可预测性和可靠性方面面临重大挑战。这些挑战源于事件驱动的推理执行的可变性和时间控制的不足,在提供推理结果的延迟可能产生严重后果的安全关键场景中构成相当大的风险。为了应对这一挑战,之前的工作引入了时间触发VTA (TT-VTA),以确保张量操作的及时执行。尽管如此,与传统VTA的51毫秒相比,TT-VTA的平均推理时间略长,为53毫秒,这表明在保持TT-VTA的确定性和可预测行为的同时,还需要对这一关键领域进行优化,以加快推理执行。本文提出了一种新的超标量TT-VTA (STT-VTA)架构,专门用于解决传统VTAs和TT-VTAs的不足。STT-VTA架构采用由扩展软件模拟器和架构配置管理器生成的基于模式的时序计划来分析给定AI模型中的张量操作,并确定所需的额外VTA模块数量,以实现比单个(TT-)VTA设置更快的推理。它集成了用于内存指令的DRAMSim2和用于非内存指令的周期精确模拟器。使用各种模型进行评估表明,STT-VTA的分类精度与传统VTA和TT-VTA相同,同时性能提高,推理时间缩短20%-41%。此外,它确保了共享资源(如内存和内存总线)的确定性时间使用和精确的定时控制以避免干扰。这些结果有助于在安全关键环境中部署人工智能系统的安全性和可靠性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.60
自引率
13.80%
发文量
500
审稿时长
7 months
期刊介绍: The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信