IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems最新文献

筛选
英文 中文
KPAC: Efficient Emulation of the ARM Pointer Authentication Instructions KPAC:ARM 指针验证指令的高效仿真
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3443773
Illia Ostapyshyn;Gabriele Serra;Tim-Marek Thomas;Daniel Lohmann
{"title":"KPAC: Efficient Emulation of the ARM Pointer Authentication Instructions","authors":"Illia Ostapyshyn;Gabriele Serra;Tim-Marek Thomas;Daniel Lohmann","doi":"10.1109/TCAD.2024.3443773","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3443773","url":null,"abstract":"ARMv8.3-A has introduced the pointer authentication (PA) feature, a new set of measures and instructions to sign and validate pointers. PA is already used and supported by the major compilers to protect the return addresses on the stack as a measure against memory corruption attacks. As more and more SoCs implement ARMv8.3-A and code compiled with PA is even fully backwards compatible on CPUs without (where the new instructions are just ignored), we can expect PA-enabled binaries to become standard in the near future. This gives rise to the question, if and how also systems without the native PA could benefit from the extra security provided by the return address protection. In this article, we explore KPAC, a set of efficient software-based approaches to bring the PA-based return-address protection onto the platforms without the hardware support in an easily adoptable (binary-compatible) and scalable manner. Technically, KPAC achieves this by either a synchronous trap-based emulation inside the kernel or an asynchronous novel memory-based invocation of a dedicated CPU core. Our experiments with the CortexSuite benchmarks, Chromium, and Memcached on a variety of platforms running Linux ranging from a Xilinx ZCU102 board over a Raspberry Pi 4 up to an 80-core Ampere Altra demonstrate the broad applicability and scalability of our approach. Furthermore, we discuss how the principles of KPAC can be generalized to the other suited problem areas.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3467-3478"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimode Security-Aware Real-Time Scheduling on Multiprocessors 多处理器上的多模式安全意识实时调度
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3445260
Jiankang Ren;Chunxiao Liu;Chi Lin;Wei Jiang;Pengfei Wang;Xiangwei Qi;Simeng Li;Shengyu Li
{"title":"Multimode Security-Aware Real-Time Scheduling on Multiprocessors","authors":"Jiankang Ren;Chunxiao Liu;Chi Lin;Wei Jiang;Pengfei Wang;Xiangwei Qi;Simeng Li;Shengyu Li","doi":"10.1109/TCAD.2024.3445260","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3445260","url":null,"abstract":"Embedded real-time systems generally execute in a predictable and deterministic manner to deliver critical functionality within stringent timing constraints. However, the predictable execution behavior leaves the system vulnerable to schedule-based attacks. In this article, we present a multimode security-aware real-time scheduling scheme to counteract schedule-based attacks on multiprocessor real-time systems. To mitigate the vulnerability to the schedule-based attack, we propose a multimode scheduling method to reduce the accumulative attack effective window (AEW) of multiple victim tasks and prevent the untrusted tasks from executing during the AEW by distinctively scheduling mixed-trust tasks according to the system mode. To avoid the protection degradation due to the excessive blocking of untrusted tasks, we introduce a protection window for multiple victims on multiprocessors by analyzing the system protection capability limit under the system schedulability constraint. Furthermore, to maximize the protection capability of the multimode security-aware scheduling strategy on a multiprocessor platform, we also propose a security-aware packing algorithm to balance the workloads of mixed-trust tasks on different processors using a mixed-trust worst-fit decreasing heuristic strategy. The experimental results demonstrate that our proposed approach significantly outperforms the state-of-the-art method. Specifically, the AEW ratio and the AEW untrusted execution time ratio are reduced by 18.8% and 62.8%, respectively, while the defense success rate against ScheduLeak attack is improved by 16.3%.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3407-3418"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AxOSpike: Spiking Neural Networks-Driven Approximate Operator Design AxOSpike:尖峰神经网络驱动的近似运算器设计
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3443000
Salim Ullah;Siva Satyendra Sahoo;Akash Kumar
{"title":"AxOSpike: Spiking Neural Networks-Driven Approximate Operator Design","authors":"Salim Ullah;Siva Satyendra Sahoo;Akash Kumar","doi":"10.1109/TCAD.2024.3443000","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3443000","url":null,"abstract":"Approximate computing (AxC) is being widely researched as a viable approach to deploying compute-intensive artificial intelligence (AI) applications on resource-constrained embedded systems. In general, AxC aims to provide disproportionate gains in system-level power-performance-area (PPA) by leveraging the implicit error tolerance of an application. One of the more widely used methods in AxC involves circuit pruning of arithmetic operators used to process AI workloads. However, most related works adopt an application-agnostic approach to operator modeling for the design space exploration (DSE) of Approximate Operators (AxOs). To this end, we propose an application-driven approach to designing AxOs. Specifically, we use spiking neural network (SNN)-based inference to present an application-driven operator model resulting in AxOs with better-PPA-accuracy tradeoffs compared to traditional circuit pruning. Additionally, we present a novel FPGA-specific operator model to improve the quality of AxOs that can be obtained using circuit pruning. With the proposed methods, we report designs with up to 26.5% lower PDPxLUTs with similar application-level accuracy. Further, we report a considerably better set of design points than related works with up to 51% better-Pareto front hypervolume.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3324-3335"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VALO: A Versatile Anytime Framework for LiDAR-Based Object Detection Deep Neural Networks VALO:基于激光雷达的物体探测多功能随时框架 深度神经网络
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3443774
Ahmet Soyyigit;Shuochao Yao;Heechul Yun
{"title":"VALO: A Versatile Anytime Framework for LiDAR-Based Object Detection Deep Neural Networks","authors":"Ahmet Soyyigit;Shuochao Yao;Heechul Yun","doi":"10.1109/TCAD.2024.3443774","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3443774","url":null,"abstract":"This work addresses the challenge of adapting dynamic deadline requirements for the LiDAR object detection deep neural networks (DNNs). The computing latency of object detection is critically important to ensure safe and efficient navigation. However, the state-of-the-art LiDAR object detection DNNs often exhibit significant latency, hindering their real-time performance on the resource-constrained edge platforms. Therefore, a tradeoff between the detection accuracy and latency should be dynamically managed at runtime to achieve the optimum results. In this article, we introduce versatile anytime algorithm for the LiDAR Object detection (VALO), a novel data-centric approach that enables anytime computing of 3-D LiDAR object detection DNNs. VALO employs a deadline-aware scheduler to selectively process the input regions, making execution time and accuracy tradeoffs without architectural modifications. Additionally, it leverages efficient forecasting of the past detection results to mitigate possible loss of accuracy due to partial processing of input. Finally, it utilizes a novel input reduction technique within its detection heads to significantly accelerate the execution without sacrificing accuracy. We implement VALO on the state-of-the-art 3-D LiDAR object detection networks, namely CenterPoint and VoxelNext, and demonstrate its dynamic adaptability to a wide range of time constraints while achieving higher accuracy than the prior state-of-the-art. Code is available at \u0000<uri>https://github.com/CSL-KU/VALOgithub.com/CSL-KU/VALO</uri>\u0000.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4045-4056"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EQ-ViT: Algorithm-Hardware Co-Design for End-to-End Acceleration of Real-Time Vision Transformer Inference on Versal ACAP Architecture EQ-ViT:在 Versal ACAP 架构上端到端加速实时视觉变换器推理的算法-硬件协同设计
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3443692
Peiyan Dong;Jinming Zhuang;Zhuoping Yang;Shixin Ji;Yanyu Li;Dongkuan Xu;Heng Huang;Jingtong Hu;Alex K. Jones;Yiyu Shi;Yanzhi Wang;Peipei Zhou
{"title":"EQ-ViT: Algorithm-Hardware Co-Design for End-to-End Acceleration of Real-Time Vision Transformer Inference on Versal ACAP Architecture","authors":"Peiyan Dong;Jinming Zhuang;Zhuoping Yang;Shixin Ji;Yanyu Li;Dongkuan Xu;Heng Huang;Jingtong Hu;Alex K. Jones;Yiyu Shi;Yanzhi Wang;Peipei Zhou","doi":"10.1109/TCAD.2024.3443692","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3443692","url":null,"abstract":"While vision transformers (ViTs) have shown consistent progress in computer vision, deploying them for real-time decision-making scenarios (<1> <tex-math>$13.1times $ </tex-math></inline-formula>\u0000 over computing solutions of Intel Xeon 8375C vCPU, Nvidia A10G, A100, Jetson AGX Orin GPUs, AMD ZCU102, and U250 FPGAs. The energy efficiency gains are 62.2, 15.33, 12.82, 13.31, 13.5, and \u0000<inline-formula> <tex-math>$21.9times $ </tex-math></inline-formula>\u0000.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3949-3960"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NOBtree: A NUMA-Optimized Tree Index for Nonvolatile Memory NOBtree:非易失性内存的 NUMA 优化树索引
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3438111
Zhaole Chu;Peiquan Jin;Yongping Luo;Xiaoliang Wang;Shouhong Wan
{"title":"NOBtree: A NUMA-Optimized Tree Index for Nonvolatile Memory","authors":"Zhaole Chu;Peiquan Jin;Yongping Luo;Xiaoliang Wang;Shouhong Wan","doi":"10.1109/TCAD.2024.3438111","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3438111","url":null,"abstract":"Nonvolatile memory (NVM) suffers from more serious nonuniform memory access (NUMA) effects than DRAM because of the lower bandwidth and higher latency. While numerous works have aimed at optimizing NVM indexes, only a few of them tried to address the NUMA impact. Existing approaches mainly rely on local NVM write buffers or DRAM-based read buffers to mitigate the cost of remote NVM access, which introduces memory overhead and causes performance degradation for lookup and scan operations. In this article, we present NOBtree, a new NUMA-optimized persistent tree index. The novelty of NOBtree is two-fold. First, NOBtree presents per-NUMA replication and an efficient node-migration mechanism to reduce remote NVM access. Second, NOBtree proposes a NUMA-aware NVM allocator to improve the insert performance and scalability. We conducted experiments on six workloads to evaluate the performance of NOBtree. The results show that NOBtree can effectively reduce the number of remote NVM accesses. Moreover, NOBtree outperforms existing persistent indexes, including TLBtree, Fast&Fair, ROART, and PACtree, by up to \u0000<inline-formula> <tex-math>$3.23times $ </tex-math></inline-formula>\u0000 in throughput and \u0000<inline-formula> <tex-math>$4.07times $ </tex-math></inline-formula>\u0000 in latency.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3840-3851"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Arch2End: Two-Stage Unified System-Level Modeling for Heterogeneous Intelligent Devices Arch2End:针对异构智能设备的两阶段统一系统级建模
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3443706
Weihong Liu;Zongwei Zhu;Boyu Li;Yi Xiong;Zirui Lian;Jiawei Geng;Xuehai Zhou
{"title":"Arch2End: Two-Stage Unified System-Level Modeling for Heterogeneous Intelligent Devices","authors":"Weihong Liu;Zongwei Zhu;Boyu Li;Yi Xiong;Zirui Lian;Jiawei Geng;Xuehai Zhou","doi":"10.1109/TCAD.2024.3443706","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3443706","url":null,"abstract":"The surge in intelligent edge computing has propelled the adoption and expansion of the distributed embedded systems (DESs). Numerous scheduling strategies are introduced to improve the DES throughput, such as latency-aware and group-based hierarchical scheduling. Effective device modeling can help in modular and plug-in scheduler design. For uniformity in scheduling interfaces, an unified device performance modeling is adopted, typically involving the system-level modeling that incorporates both the hardware and software stacks, broadly divided into two categories. Fine-grained modeling methods based on the hardware architecture analysis become very difficult when dealing with a large number of heterogeneous devices, mainly because much architecture information is closed-source and costly to analyse. Coarse-grained methods are based on the limited architecture information or benchmark models, resulting in insufficient generalization in the complex inference performance of diverse deep neural networks (DNNs). Therefore, we introduce a two-stage system-level modeling method (Arch2End), combining limited architecture information with scalable benchmark models to achieve an unified performance representation. Stage one leverages public information to analyse architectures in an uniform abstraction and to design the benchmark models for exploring the device performance boundaries, ensuring uniformity. Stage two extracts critical device features from the end-to-end inference metrics of extensive simulation models, ensuring universality and enhancing characterization capacity. Compared to the state-of-the-art methods, Arch2End achieves the lowest DNN latency prediction relative errors in the NAS-Bench-201 (1.7%) and real-world DNNs (8.2%). It also showcases superior performance in intergroup balanced device grouping strategies.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4154-4165"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ARTEMIS: A Mixed Analog-Stochastic In-DRAM Accelerator for Transformer Neural Networks ARTEMIS:用于变压器神经网络的模拟-随机 In-DRAM 混合加速器
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3446719
Salma Afifi;Ishan Thakkar;Sudeep Pasricha
{"title":"ARTEMIS: A Mixed Analog-Stochastic In-DRAM Accelerator for Transformer Neural Networks","authors":"Salma Afifi;Ishan Thakkar;Sudeep Pasricha","doi":"10.1109/TCAD.2024.3446719","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3446719","url":null,"abstract":"Transformers have emerged as a powerful tool for natural language processing (NLP) and computer vision. Through the attention mechanism, these models have exhibited remarkable performance gains when compared to conventional approaches like recurrent neural networks (RNNs) and convolutional neural networks (CNNs). Nevertheless, transformers typically demand substantial execution time due to their extensive computations and large memory footprint. Processing in-memory (PIM) and near-memory computing (NMC) are promising solutions to accelerating transformers as they offer high-compute parallelism and memory bandwidth. However, designing PIM/NMC architectures to support the complex operations and massive amounts of data that need to be moved between layers in transformer neural networks remains a challenge. We propose ARTEMIS, a mixed analog-stochastic in-DRAM accelerator for transformer models. Through employing minimal changes to the conventional DRAM arrays, ARTEMIS efficiently alleviates the costs associated with transformer model execution by supporting stochastic computing for multiplications and temporal analog accumulations using a novel in-DRAM metal-on-metal capacitor. Our analysis indicates that ARTEMIS exhibits at least \u0000<inline-formula> <tex-math>$3.0times $ </tex-math></inline-formula>\u0000 speedup, and \u0000<inline-formula> <tex-math>$1.8times $ </tex-math></inline-formula>\u0000 lower energy compared to GPU, TPU, CPU, and state-of-the-art PIM transformer hardware accelerators.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3336-3347"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Latent RAGE: Randomness Assessment Using Generative Entropy Models 潜在 RAGE:使用生成熵模型进行随机性评估
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3449562
Kuheli Pratihar;Rajat Subhra Chakraborty;Debdeep Mukhopadhyay
{"title":"Latent RAGE: Randomness Assessment Using Generative Entropy Models","authors":"Kuheli Pratihar;Rajat Subhra Chakraborty;Debdeep Mukhopadhyay","doi":"10.1109/TCAD.2024.3449562","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3449562","url":null,"abstract":"NIST’s recent review of the widely employed special publication (SP) 800–22 randomness testing suite has underscored several shortcomings, particularly the absence of entropy source modeling and the necessity for large sequence lengths. Motivated by this revelation, we explore low-dimensional modeling of the entropy source in random number generators (RNGs) using a variational autoencoder (VAE). This low-dimensional modeling enables the separation between strong and weak entropy sources by magnifying the deterministic effects in the latter, which are otherwise difficult to detect with conventional testing. Bits from weak-entropy RNGs with bias, correlation, or deterministic patterns are more likely to lie on a low-dimensional manifold within a high-dimensional space, in contrast to strong-entropy RNGs, such as true RNGs (TRNGs) and pseudo-RNGs (PRNGs) with uniformly distributed bits. We exploit this insight to employ a generative AI-based noninterference test (GeNI) for the first time, achieving implementation-agnostic low-dimensional modeling of all types of entropy sources. GeNI’s generative aspect uses VAEs to produce synthetic bitstreams from the latent representation of RNGs, which are subjected to a deep learning (DL)-based noninterference (NI) test evaluating the masking ability of the synthetic bitstreams. The core principle of the NI test is that if the bitstream exhibits high-quality randomness, the masked data from the two sources should be indistinguishable. GeNI facilitates a comparative analysis of low-dimensional entropy source representations across various RNGs, adeptly identifying the artificial randomness in specious RNGs with deterministic patterns that otherwise passes all NIST SP800-22 tests. Notably, GeNI achieves this with \u0000<inline-formula> <tex-math>$10times $ </tex-math></inline-formula>\u0000 lower-sequence lengths and \u0000<inline-formula> <tex-math>$16.5times $ </tex-math></inline-formula>\u0000 faster execution time compared to the NIST test suite.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3503-3514"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ROI-HIT: Region of Interest-Driven High-Dimensional Microarchitecture Design Space Exploration ROI-HIT:兴趣区域驱动的高维微架构设计空间探索
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3443006
Xuyang Zhao;Tianning Gao;Aidong Zhao;Zhaori Bi;Changhao Yan;Fan Yang;Sheng-Guo Wang;Dian Zhou;Xuan Zeng
{"title":"ROI-HIT: Region of Interest-Driven High-Dimensional Microarchitecture Design Space Exploration","authors":"Xuyang Zhao;Tianning Gao;Aidong Zhao;Zhaori Bi;Changhao Yan;Fan Yang;Sheng-Guo Wang;Dian Zhou;Xuan Zeng","doi":"10.1109/TCAD.2024.3443006","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3443006","url":null,"abstract":"Exploring the design space of RISC-V processors faces significant challenges due to the vastness of the high-dimensional design space and the associated expensive simulation costs. This work proposes a region of interest (ROI)-driven method, which focuses on the promising ROIs to reduce the over-exploration on the huge design space and improve the optimization efficiency. A tree structure based on self-organizing map (SOM) networks is proposed to partition the design space into ROIs. To reduce the high dimensionality of design space, a variable selection technique based on a sensitivity matrix is developed to prune unimportant design parameters and efficiently hit the optimum inside the ROIs. Moreover, an asynchronous parallel strategy is employed to further save the time taken by simulations. Experimental results demonstrate the superiority of our proposed method, achieving improvements of up to 43.82% in performance, 33.20% in power consumption, and 11.41% in area compared to state-of-the-art methods.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4178-4189"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信