{"title":"Superscalar Time-Triggered Versatile-Tensor Accelerator","authors":"Yosab Bebawy;Aniebiet Micheal Ezekiel;Roman Obermaisser","doi":"10.1109/TCAD.2025.3528355","DOIUrl":null,"url":null,"abstract":"Integrating AI hardware accelerators into safety-critical real-time systems to speed up the inference execution of safety-critical AI applications demands rigorous assurance to prevent potentially catastrophic outcomes, especially in environments where timely and accurate results are crucial. Even in cases where AI models are potentially designed and constructed correctly using AI frameworks, the system’s safety will also rely on the real-time behavior of the AI hardware accelerator. While AI hardware accelerators can achieve the necessary throughput, conventional accelerators, such as the versatile tensor accelerator (VTA) encounter significant challenges in predictability and reliability. These challenges stem from the variability in event-driven inference execution and insufficient timing control, posing considerable risks in safety-critical scenarios where delays in providing inference results can have severe consequences. To address this challenge, previous work introduced the time-triggered VTA (TT-VTA) to ensure timely execution of tensor operations. Nonetheless, the TT-VTA exhibited a slightly longer average inference time of 53 ms compared to the conventional VTA’s 51 ms, underscoring the ongoing need for optimization in this crucial domain to speed up the inference execution, while sustaining the deterministic and predictable behavior of the TT-VTA. This article proposes a novel superscalar TT-VTA (STT-VTA) architecture specifically designed to address the deficiencies of conventional VTAs and TT-VTAs. The STT-VTA architecture employs pattern-based timing schedules generated by an extended software simulator and an architecture configuration manager to analyze tensor operations within a given AI model and determine the required number of additional VTA modules for faster inference than a single (TT-)VTA setup. It integrates DRAMSim2 for memory instructions and a cycle-accurate simulator for nonmemory instructions. Evaluation using various models demonstrates that the STT-VTA achieves identical classification accuracy as the conventional VTA and TT-VTA, while improving performance and reducing inference time by 20%–41%. Moreover, it ensures deterministic temporal use of shared resources, such as memories and memory-buses and precise timing control to avoid interference. These results contribute toward safety and reliability of AI systems deployed in a safety-critical environment.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 7","pages":"2503-2515"},"PeriodicalIF":2.9000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10836726/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Integrating AI hardware accelerators into safety-critical real-time systems to speed up the inference execution of safety-critical AI applications demands rigorous assurance to prevent potentially catastrophic outcomes, especially in environments where timely and accurate results are crucial. Even in cases where AI models are potentially designed and constructed correctly using AI frameworks, the system’s safety will also rely on the real-time behavior of the AI hardware accelerator. While AI hardware accelerators can achieve the necessary throughput, conventional accelerators, such as the versatile tensor accelerator (VTA) encounter significant challenges in predictability and reliability. These challenges stem from the variability in event-driven inference execution and insufficient timing control, posing considerable risks in safety-critical scenarios where delays in providing inference results can have severe consequences. To address this challenge, previous work introduced the time-triggered VTA (TT-VTA) to ensure timely execution of tensor operations. Nonetheless, the TT-VTA exhibited a slightly longer average inference time of 53 ms compared to the conventional VTA’s 51 ms, underscoring the ongoing need for optimization in this crucial domain to speed up the inference execution, while sustaining the deterministic and predictable behavior of the TT-VTA. This article proposes a novel superscalar TT-VTA (STT-VTA) architecture specifically designed to address the deficiencies of conventional VTAs and TT-VTAs. The STT-VTA architecture employs pattern-based timing schedules generated by an extended software simulator and an architecture configuration manager to analyze tensor operations within a given AI model and determine the required number of additional VTA modules for faster inference than a single (TT-)VTA setup. It integrates DRAMSim2 for memory instructions and a cycle-accurate simulator for nonmemory instructions. Evaluation using various models demonstrates that the STT-VTA achieves identical classification accuracy as the conventional VTA and TT-VTA, while improving performance and reducing inference time by 20%–41%. Moreover, it ensures deterministic temporal use of shared resources, such as memories and memory-buses and precise timing control to avoid interference. These results contribute toward safety and reliability of AI systems deployed in a safety-critical environment.
期刊介绍:
The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.