IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems最新文献

筛选
英文 中文
Efficient Discovery of Actual Causality Using Abstraction Refinement 利用抽象细化高效发现实际因果关系
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3448299
Arshia Rafieioskouei;Borzoo Bonakdarpour
{"title":"Efficient Discovery of Actual Causality Using Abstraction Refinement","authors":"Arshia Rafieioskouei;Borzoo Bonakdarpour","doi":"10.1109/TCAD.2024.3448299","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3448299","url":null,"abstract":"Causality is the relationship where one event contributes to the production of another, with the cause being partly responsible for the effect and the effect partly dependent on the cause. In this article, we propose a novel and effective method to formally reason about the causal effect of events in engineered systems, with application for finding the root-cause of safety violations in embedded and cyber-physical systems. We are motivated by the notion of actual causality by Halpern and Pearl, which focuses on the causal effect of particular events rather than type-level causality, which attempts to make general statements about scientific and natural phenomena. Our first contribution is formulating discovery of actual causality in computing systems modeled by transition systems as an satisfiability modulo theory solving problem. Since datasets for causality analysis tend to be large, in order to tackle the scalability problem of automated formal reasoning, our second contribution is a novel technique based on abstraction refinement that allows identifying for actual causes within smaller abstract causal models. We demonstrate the effectiveness of our approach (by several orders of magnitude) using three case studies to find the actual cause of violations of safety in 1) a neural network controller for a mountain car; 2) a controller for a Lunar Lander obtained by reinforcement learning; and 3) an MPC controller for an F-16 autopilot simulator.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4274-4285"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hyper Parametric Timed CTL 超参数定时 CTL
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3443704
Masaki Waga;Étienne André
{"title":"Hyper Parametric Timed CTL","authors":"Masaki Waga;Étienne André","doi":"10.1109/TCAD.2024.3443704","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3443704","url":null,"abstract":"Hyperproperties enable simultaneous reasoning about multiple execution traces of a system and are useful to reason about noninterference, opacity, robustness, fairness, observational determinism, etc. We introduce hyper parametric timed computation tree logic (HyperPTCTL), extending hyperlogics with timing reasoning and, notably, parameters to express unknown values. We mainly consider its nest-free fragment, where the temporal operators cannot be nested. However, we allow extensions that enable counting actions and comparing the duration since the most recent occurrence of specific actions. We show that our nest-free fragment with this extension is sufficiently expressive to encode the properties, e.g., opacity, (un)fairness, or robust observational (non)determinism. We propose semi-algorithms for the model checking and synthesis of parametric timed automata (TAs) (an extension of TAs with timing parameters) against this nest-free fragment with the extension via reduction to the PTCTL model checking and synthesis. While the general model checking (and thus synthesis) problem is undecidable, we show that a large part of our extended (yet nest-free) fragment is decidable, provided the parameters only appear in the property, not in the model. We also exhibit additional decidable fragments where the parameters within the model are allowed. We implemented our semi-algorithms on the top of the IMITATOR model checker and performed experiments. Our implementation supports most of the nest-free fragments (beyond the decidable classes). The experimental results highlight our method’s practical relevance.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4286-4297"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interval Image Abstraction for Verification of Camera-Based Autonomous Systems 验证基于摄像头的自主系统的间隔图像抽象
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3448306
P. Habeeb;Deepak D’Souza;Kamal Lodaya;Pavithra Prabhakar
{"title":"Interval Image Abstraction for Verification of Camera-Based Autonomous Systems","authors":"P. Habeeb;Deepak D’Souza;Kamal Lodaya;Pavithra Prabhakar","doi":"10.1109/TCAD.2024.3448306","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3448306","url":null,"abstract":"We propose an abstraction-refinement-based algorithm for the problem of verifying the safety of a camera-based autonomous system in a synthetic 3D-scene, based on the notion of interval images. An interval image is an abstract data structure that represents a set of images in a 3D-scene. We give a computer graphics style rendering algorithm to efficiently compute interval images from a given region. Our proposed abstraction-refinement algorithm leverages recent abstract interpretation tools for neural networks. We have implemented and evaluated the proposed technique on complex 3D-scenes, demonstrating its effectiveness and scalability in comparison with earlier techniques.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4310-4321"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical Reachability Analysis of Stochastic Cyber-Physical Systems Under Distribution Shift 分布偏移下随机网络物理系统的统计可达性分析
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3438072
Navid Hashemi;Lars Lindemann;Jyotirmoy V. Deshmukh
{"title":"Statistical Reachability Analysis of Stochastic Cyber-Physical Systems Under Distribution Shift","authors":"Navid Hashemi;Lars Lindemann;Jyotirmoy V. Deshmukh","doi":"10.1109/TCAD.2024.3438072","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3438072","url":null,"abstract":"Reachability analysis is a popular method to give safety guarantees for stochastic cyber-physical systems (SCPSs) that takes in a symbolic description of the system dynamics and uses set-propagation methods to compute an overapproximation of the set of reachable states over a bounded time horizon. In this article, we investigate the problem of performing reachability analysis for an SCPS that does not have a symbolic description of the dynamics, but instead is described using a digital twin model that can be simulated to generate system trajectories. An important challenge is that the simulator implicitly models a probability distribution over the set of trajectories of the SCPS; however, it is typical to have a sim2real gap, i.e., the actual distribution of the trajectories in a deployment setting may be shifted from the distribution assumed by the simulator. We thus propose a statistical reachability analysis technique that, given a user-provided threshold \u0000<inline-formula> <tex-math>$1-epsilon $ </tex-math></inline-formula>\u0000, provides a set that guarantees that any trajectory during deployment lies in this set with probability not smaller than this threshold. Our method is based on three main steps: 1) learning a deterministic surrogate model from sampled trajectories; 2) conducting reachability analysis over the surrogate model; and 3) employing robust conformal inference (CI) using an additional set of sampled trajectories to quantify the surrogate model’s distribution shift with respect to the deployed SCPS. To counter conservatism in reachable sets, we propose a novel method to train surrogate models that minimizes a quantile loss term (instead of the usual mean squared loss), and a new method that provides tighter guarantees using CI using a normalized surrogate error. We demonstrate the effectiveness of our technique on various case studies.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4250-4261"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Approximate Conformance Checking for Closed-Loop Systems With Neural Network Controllers 利用神经网络控制器对闭环系统进行近似一致性检查
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3445813
P. Habeeb;Lipsy Gupta;Pavithra Prabhakar
{"title":"Approximate Conformance Checking for Closed-Loop Systems With Neural Network Controllers","authors":"P. Habeeb;Lipsy Gupta;Pavithra Prabhakar","doi":"10.1109/TCAD.2024.3445813","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3445813","url":null,"abstract":"In this article, we consider the problem of checking approximate conformance of closed-loop systems with the same plant but different neural network (NN) controllers. First, we introduce a notion of approximate conformance on NNs, which allows us to quantify semantically the deviations in closed-loop system behaviors with different NN controllers. Next, we consider the problem of computationally checking this notion of approximate conformance on two NNs. We reduce this problem to that of reachability analysis on a combined NN, thereby, enabling the use of existing NN verification tools for conformance checking. Our experimental results on an autonomous rocket landing system demonstrate the feasibility of checking approximate conformance on different NNs trained for the same dynamics, as well as the practical semantic closeness exhibited by the corresponding closed-loop systems.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4322-4333"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CaBaFL: Asynchronous Federated Learning via Hierarchical Cache and Feature Balance CaBaFL:通过分层缓存和特征平衡进行异步联合学习
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3446881
Zeke Xia;Ming Hu;Dengke Yan;Xiaofei Xie;Tianlin Li;Anran Li;Junlong Zhou;Mingsong Chen
{"title":"CaBaFL: Asynchronous Federated Learning via Hierarchical Cache and Feature Balance","authors":"Zeke Xia;Ming Hu;Dengke Yan;Xiaofei Xie;Tianlin Li;Anran Li;Junlong Zhou;Mingsong Chen","doi":"10.1109/TCAD.2024.3446881","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3446881","url":null,"abstract":"Federated learning (FL) as a promising distributed machine learning paradigm has been widely adopted in Artificial Intelligence of Things (AIoT) applications. However, the efficiency and inference capability of FL is seriously limited due to the presence of stragglers and data imbalance across massive AIoT devices, respectively. To address the above challenges, we present a novel asynchronous FL approach named CaBaFL, which includes a hierarchical cache-based aggregation mechanism and a feature balance-guided device selection strategy. CaBaFL maintains multiple intermediate models simultaneously for local training. The hierarchical cache-based aggregation mechanism enables each intermediate model to be trained on multiple devices to align the training time and mitigate the straggler issue. In specific, each intermediate model is stored in a low-level cache for local training and when it is trained by sufficient local devices, it will be stored in a high-level cache for aggregation. To address the problem of imbalanced data, the feature balance-guided device selection strategy in CaBaFL adopts the activation distribution as a metric, which enables each intermediate model to be trained across devices with totally balanced data distributions before aggregation. Experimental results show that compared to the state-of-the-art FL methods, CaBaFL achieves up to 9.26X training acceleration and 19.71% accuracy improvements.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4057-4068"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Backdoor Attacks on Safe Reinforcement Learning-Enabled Cyber–Physical Systems 对安全强化学习网络物理系统的后门攻击
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3447468
Shixiong Jiang;Mengyu Liu;Fanxin Kong
{"title":"Backdoor Attacks on Safe Reinforcement Learning-Enabled Cyber–Physical Systems","authors":"Shixiong Jiang;Mengyu Liu;Fanxin Kong","doi":"10.1109/TCAD.2024.3447468","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3447468","url":null,"abstract":"Safe reinforcement learning (RL) aims to derive a control policy that navigates a safety-critical system while avoiding unsafe explorations and adhering to safety constraints. While safe RL has been extensively studied, its vulnerabilities during the policy training have barely been explored in an adversarial setting. This article bridges this gap and investigates the training time vulnerability of formal language-guided safe RL. Such vulnerability allows a malicious adversary to inject backdoor behavior into the learned control policy. First, we formally define backdoor attacks for safe RL and divide them into active and passive ones depending on whether to manipulate the observation. Second, we propose two novel algorithms to synthesize the two kinds of attacks, respectively. Both algorithms generate backdoor behaviors that may go unnoticed after deployment but can be triggered when specific states are reached, leading to safety violations. Finally, we conduct both theoretical analysis and extensive experiments to show the effectiveness and stealthiness of our methods.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4093-4104"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bank on Compute-Near-Memory: Design Space Exploration of Processing-Near-Bank Architectures 计算-近存银行:处理近存储架构的设计空间探索
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3442989
Rafael Medina;Giovanni Ansaloni;Marina Zapater;Alexandre Levisse;Saeideh Alinezhad Chamazcoti;Timon Evenblij;Dwaipayan Biswas;Francky Catthoor;David Atienza
{"title":"Bank on Compute-Near-Memory: Design Space Exploration of Processing-Near-Bank Architectures","authors":"Rafael Medina;Giovanni Ansaloni;Marina Zapater;Alexandre Levisse;Saeideh Alinezhad Chamazcoti;Timon Evenblij;Dwaipayan Biswas;Francky Catthoor;David Atienza","doi":"10.1109/TCAD.2024.3442989","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3442989","url":null,"abstract":"Near-DRAM computing strategies advocate for providing computational capabilities close to where data is stored. Although this paradigm can effectively address the memory-to-processor communication bottleneck, it also presents new challenges: The strict resource constraints in the memory periphery demand careful tailoring of architectural elements. We herein propose a novel framework and methodology to explore compute-near-memory designs that interface to DRAM memory banks, demonstrating the area, energy, and performance tradeoffs subject to the architectural configuration. We exemplify this methodology by conducting two studies on compute-near-bank designs: 1) analyzing the interaction between control and data resources, and 2) exploring the integration of processing units with different DRAM standards. According to our study, the optimal size ratios between instruction and data capacity vary from \u0000<inline-formula> <tex-math>$2times $ </tex-math></inline-formula>\u0000 to \u0000<inline-formula> <tex-math>$4times $ </tex-math></inline-formula>\u0000 across benchmarks from representative application domains. The retrieved Pareto-optimal solutions from our framework improve state-of-the-art designs, e.g., achieving a 50% performance increase on matrix operations with 15% energy overhead relative to the FIMDRAM design. In addition, the exploration of DRAM shows the interplay between available internal bandwidth, performance, and area overhead. For example, a threefold increase in bandwidth rises performance by 47% across workloads at a 34% extra area cost.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4117-4129"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FlexFL: Heterogeneous Federated Learning via APoZ-Guided Flexible Pruning in Uncertain Scenarios FlexFL:在不确定场景中通过 APoZ 引导的灵活剪枝进行异构联合学习
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3444695
Zekai Chen;Chentao Jia;Ming Hu;Xiaofei Xie;Anran Li;Mingsong Chen
{"title":"FlexFL: Heterogeneous Federated Learning via APoZ-Guided Flexible Pruning in Uncertain Scenarios","authors":"Zekai Chen;Chentao Jia;Ming Hu;Xiaofei Xie;Anran Li;Mingsong Chen","doi":"10.1109/TCAD.2024.3444695","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3444695","url":null,"abstract":"Along with the increasing popularity of deep learning (DL) techniques, more and more Artificial Intelligence of Things (AIoT) systems are adopting federated learning (FL) to enable privacy-aware collaborative learning among the AIoT devices. However, due to the inherent data and device heterogeneity issues, the existing FL-based AIoT systems suffer from the model selection problem. Although various heterogeneous FL methods have been investigated to enable collaborative training among the heterogeneous models, there is still a lack of 1) wise heterogeneous model generation methods for the devices; 2) consideration of uncertain factors; and 3) performance guarantee for the large models, thus strongly limiting the overall FL performance. To address the above issues, this article introduces a novel heterogeneous FL framework named FlexFL. By adopting our average percentage of zeros (APoZ)-guided flexible pruning strategy, FlexFL can effectively derive best-fit models for the heterogeneous devices to explore their greatest potential. Meanwhile, our proposed adaptive local pruning strategy allows the AIoT devices to prune their received models according to their varying resources within uncertain scenarios. Moreover, based on the self-knowledge distillation, FlexFL can enhance the inference performance of the large models by learning the knowledge from the small models. Comprehensive experimental results show that, compared to the state-of-the-art heterogeneous FL methods, FlexFL can significantly improve the overall inference accuracy by up to 14.24%. Our code can be found here \u0000<uri>https://github.com/mastlab-T3S/FlexFL</uri>\u0000.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4069-4080"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deeploy: Enabling Energy-Efficient Deployment of Small Language Models on Heterogeneous Microcontrollers Deeploy:在异构微控制器上实现小型语言模型的高能效部署
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3443718
Moritz Scherer;Luka Macan;Victor J. B. Jung;Philip Wiese;Luca Bompani;Alessio Burrello;Francesco Conti;Luca Benini
{"title":"Deeploy: Enabling Energy-Efficient Deployment of Small Language Models on Heterogeneous Microcontrollers","authors":"Moritz Scherer;Luka Macan;Victor J. B. Jung;Philip Wiese;Luca Bompani;Alessio Burrello;Francesco Conti;Luca Benini","doi":"10.1109/TCAD.2024.3443718","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3443718","url":null,"abstract":"With the rise of embodied foundation models (EFMs), most notably small language models (SLMs), adapting Transformers for the edge applications has become a very active field of research. However, achieving the end-to-end deployment of SLMs on the microcontroller (MCU)-class chips without high-bandwidth off-chip main memory access is still an open challenge. In this article, we demonstrate high efficiency end-to-end SLM deployment on a multicore RISC-V (RV32) MCU augmented with ML instruction extensions and a hardware neural processing unit (NPU). To automate the exploration of the constrained, multidimensional memory versus computation tradeoffs involved in the aggressive SLM deployment on the heterogeneous (multicore+NPU) resources, we introduce Deeploy, a novel deep neural network (DNN) compiler, which generates highly optimized C code requiring minimal runtime support. We demonstrate that Deeploy generates the end-to-end code for executing SLMs, fully exploiting the RV32 cores’ instruction extensions and the NPU. We achieve leading-edge energy and throughput of \u0000<inline-formula> <tex-math>$490 ; mu $ </tex-math></inline-formula>\u0000J per token, at 340 token per second for an SLM trained on the TinyStories dataset, running for the first time on an MCU-class device without the external memory.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4009-4020"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信