IEEE Computer Architecture Letters最新文献

筛选
英文 中文
Architectural Security Regulation 《建筑保安规例》
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-10-31 DOI: 10.1109/LCA.2023.3327952
Adam Hastings;Ryan Piersma;Simha Sethumadhavan
{"title":"Architectural Security Regulation","authors":"Adam Hastings;Ryan Piersma;Simha Sethumadhavan","doi":"10.1109/LCA.2023.3327952","DOIUrl":"10.1109/LCA.2023.3327952","url":null,"abstract":"Across the world, governments are instituting regulations with the goal of improving the state of computer security. In this paper, we propose how security regulation can be formulated and implemented at the architectural level. Our proposal, called FAIRSHARE, requires architects to spend a pre-determined fraction of system resources (e.g., execution cycles) towards security but leaves the decision of how and where to spend this budget up to the architects of these systems. We discuss how this can elevate security and outline the key architectural support necessary to implement such a solution. Our work is the first work at the intersection of architecture and regulation.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135263775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Quantum Computer Trusted Execution Environment 量子计算机可信执行环境
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-10-19 DOI: 10.1109/LCA.2023.3325852
Theodoros Trochatos;Chuanqi Xu;Sanjay Deshpande;Yao Lu;Yongshan Ding;Jakub Szefer
{"title":"A Quantum Computer Trusted Execution Environment","authors":"Theodoros Trochatos;Chuanqi Xu;Sanjay Deshpande;Yao Lu;Yongshan Ding;Jakub Szefer","doi":"10.1109/LCA.2023.3325852","DOIUrl":"10.1109/LCA.2023.3325852","url":null,"abstract":"We present the first architecture for a trusted execution environment for quantum computers. In the architecture, to protect the user's circuits, they are obfuscated with decoy control pulses added during circuit transpilation by the user. The decoy pulses are removed, i.e. attenuated, by the trusted hardware inside the superconducting quantum computer's fridge before they reach the qubits. This preliminary work demonstrates that protection from possibly malicious cloud providers is feasible with minimal hardware cost.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135056635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Architectural Implications of GNN Aggregation Programming Abstractions GNN 聚合编程抽象的架构影响
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-10-19 DOI: 10.1109/LCA.2023.3326170
Yingjie Qi;Jianlei Yang;Ao Zhou;Tong Qiao;Chunming Hu
{"title":"Architectural Implications of GNN Aggregation Programming Abstractions","authors":"Yingjie Qi;Jianlei Yang;Ao Zhou;Tong Qiao;Chunming Hu","doi":"10.1109/LCA.2023.3326170","DOIUrl":"10.1109/LCA.2023.3326170","url":null,"abstract":"Graph neural networks (GNNs) have gained significant popularity due to the powerful capability to extract useful representations from graph data. As the need for efficient GNN computation intensifies, a variety of programming abstractions designed for optimizing GNN Aggregation have emerged to facilitate acceleration. However, there is no comprehensive evaluation and analysis upon existing abstractions, thus no clear consensus on which approach is better. In this letter, we classify existing programming abstractions for GNN Aggregation by the dimension of data organization and propagation method. By constructing these abstractions on a state-of-the-art GNN library, we perform a thorough and detailed characterization study to compare their performance and efficiency, and provide several insights on future GNN acceleration based on our analysis.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135056990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Hardware-Friendly Tiled Singular-Value Decomposition-Based Matrix Multiplication for Transformer-Based Models 基于变压器模型的一种硬件友好的平铺奇异值分解矩阵乘法
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-10-13 DOI: 10.1109/LCA.2023.3323482
Hailong Li;Jaewan Choi;Yongsuk Kwon;Jung Ho Ahn
{"title":"A Hardware-Friendly Tiled Singular-Value Decomposition-Based Matrix Multiplication for Transformer-Based Models","authors":"Hailong Li;Jaewan Choi;Yongsuk Kwon;Jung Ho Ahn","doi":"10.1109/LCA.2023.3323482","DOIUrl":"10.1109/LCA.2023.3323482","url":null,"abstract":"Transformer-based models have become the backbone of numerous state-of-the-art natural language processing (NLP) tasks, including large language models. Matrix multiplication, a fundamental operation in the Transformer-based models, accounts for most of the execution time. While singular value decomposition (SVD) can accelerate this operation by reducing the amount of computation and memory footprints through rank size reduction, it leads to degraded model quality due to challenges in preserving important information. Moreover, this method does not effectively utilize the resources of modern GPUs. In this paper, we propose a hardware-friendly approach: matrix multiplication based on tiled singular value decomposition (TSVD). TSVD divides a matrix into multiple tiles and performs matrix factorization on each tile using SVD. By breaking down the process into smaller regions, TSVD mitigates the loss of important data. We apply the matrices decomposed by TSVD for matrix multiplication, and our TSVD-based matrix multiplication (TSVD-matmul) leverages GPU resources more efficiently compared to the SVD approach. As a result, TSVD-matmul achieved a speedup of 1.03× to 3.24× compared to the SVD approach at compression ratios ranging from 2 to 8. When deployed to GPT-2, TSVD not only performs competitively with a full fine-tuning on the E2E NLG task but also achieves a speedup of 1.06× to 1.24× at 2 to 8 compression ratios while increasing accuracy by up to 1.5 BLEU score.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136305409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inter-Temperature Bandwidth Reduction in Cryogenic QAOA Machines 降低低温 QAOA 设备的温间带宽
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-10-09 DOI: 10.1109/LCA.2023.3322700
Yosuke Ueno;Yuna Tomida;Teruo Tanimoto;Masamitsu Tanaka;Yutaka Tabuchi;Koji Inoue;Hiroshi Nakamura
{"title":"Inter-Temperature Bandwidth Reduction in Cryogenic QAOA Machines","authors":"Yosuke Ueno;Yuna Tomida;Teruo Tanimoto;Masamitsu Tanaka;Yutaka Tabuchi;Koji Inoue;Hiroshi Nakamura","doi":"10.1109/LCA.2023.3322700","DOIUrl":"10.1109/LCA.2023.3322700","url":null,"abstract":"The bandwidth limit between cryogenic and room-temperature environments is a critical bottleneck in superconducting noisy intermediate-scale quantum computers. This paper presents the first trial of algorithm-aware system-level optimization to solve this issue by targeting the quantum approximate optimization algorithm. Our counter-based cryogenic architecture using single-flux quantum logic shows exponential bandwidth reduction and decreases heat inflow and peripheral power consumption of inter-temperature cables, which contributes to the scalability of superconducting quantum computers.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136053842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NoHammer: Preventing Row Hammer With Last-Level Cache Management NoHammer:防止行锤与最后一级缓存管理
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-09-29 DOI: 10.1109/LCA.2023.3320670
Seunghak Lee;Ki-Dong Kang;Gyeongseo Park;Nam Sung Kim;Daehoon Kim
{"title":"NoHammer: Preventing Row Hammer With Last-Level Cache Management","authors":"Seunghak Lee;Ki-Dong Kang;Gyeongseo Park;Nam Sung Kim;Daehoon Kim","doi":"10.1109/LCA.2023.3320670","DOIUrl":"https://doi.org/10.1109/LCA.2023.3320670","url":null,"abstract":"Row Hammer (RH) is a circuit-level phenomenon where repetitive activation of a DRAM row causes bit-flips in adjacent rows. Prior studies that rely on extra refreshes to mitigate RH vulnerability demonstrate that bit-flips can be prevented effectively. However, its implementation is challenging due to the significant performance degradation and energy overhead caused by the additional extra refresh for the RH mitigation. To overcome challenges, some studies propose techniques to mitigate the RH attack without relying on extra refresh. These techniques include delaying the activation of an aggressor row for a certain amount of time or swapping an aggressor row with another row to isolate it from victim rows. Although such techniques do not require extra refreshes to mitigate RH, the activation delaying technique may result in high-performance degradation in false-positive cases, and the swapping technique requires high storage overheads to track swap information. We propose \u0000<monospace>NoHammer</monospace>\u0000, an efficient RH mitigation technique to prevent the bit-flips caused by the RH attack by utilizing Last-Level Cache (LLC) management. \u0000<monospace>NoHammer</monospace>\u0000 temporarily extends the associativity of the cache set that is being targeted by utilizing another cache set as the extended set and keeps the cache lines of aggressor rows on the extended set under the eviction-based RH attack. Along with the modification of the LLC replacement policy, \u0000<monospace>NoHammer</monospace>\u0000 ensures that the aggressor row's cache lines are not evicted from the LLC under the RH attack. In our evaluation, we demonstrate that \u0000<monospace>NoHammer</monospace>\u0000 gives 6% higher performance than a baseline without any RH mitigation technique by replacing excessive cache misses caused by the RH attack with LLC hits through sophisticated LLC management, while requiring 45% less storage than prior proposals.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49962232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hungarian Qubit Assignment for Optimized Mapping of Quantum Circuits on Multi-Core Architectures 多核体系结构上优化量子电路映射的匈牙利量子比特分配
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-09-25 DOI: 10.1109/LCA.2023.3318857
Pau Escofet;Anabel Ovide;Carmen G. Almudever;Eduard Alarcón;Sergi Abadal
{"title":"Hungarian Qubit Assignment for Optimized Mapping of Quantum Circuits on Multi-Core Architectures","authors":"Pau Escofet;Anabel Ovide;Carmen G. Almudever;Eduard Alarcón;Sergi Abadal","doi":"10.1109/LCA.2023.3318857","DOIUrl":"https://doi.org/10.1109/LCA.2023.3318857","url":null,"abstract":"Modular quantum computing architectures offer a promising alternative to monolithic designs for overcoming the scaling limitations of current quantum computers. To achieve scalability beyond small prototypes, quantum architectures are expected to adopt a modular approach, featuring clusters of tightly connected quantum bits with sparser connections between these clusters. Efficiently distributing qubits across multiple processing cores is critical for improving quantum computing systems’ performance and scalability. To address this challenge, we propose the Hungarian Qubit Assignment (HQA) algorithm, which leverages the Hungarian algorithm to improve qubit-to-core assignment. The HQA algorithm considers the interactions between qubits over the entire circuit, enabling fine-grained partitioning and enhanced qubit utilization. We compare the HQA algorithm with state-of-the-art alternatives through comprehensive experiments using both real-world quantum algorithms and random quantum circuits. The results demonstrate the superiority of our proposed approach, outperforming existing methods, with an average improvement of 1.28×.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49993040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hardware-Assisted Code-Pointer Tagging for Forward-Edge Control-Flow Integrity 前缘控制流完整性的硬件辅助代码指针标记
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-09-22 DOI: 10.1109/LCA.2023.3306326
Yonghae Kim;Anurag Kar;Jaewon Lee;Jaekyu Lee;Hyesoon Kim
{"title":"Hardware-Assisted Code-Pointer Tagging for Forward-Edge Control-Flow Integrity","authors":"Yonghae Kim;Anurag Kar;Jaewon Lee;Jaekyu Lee;Hyesoon Kim","doi":"10.1109/LCA.2023.3306326","DOIUrl":"https://doi.org/10.1109/LCA.2023.3306326","url":null,"abstract":"Software attacks typically operate by overwriting control data, such as a return address and a function pointer, and hijacking the control flow of a program. To prevent such attacks, a number of control-flow integrity (CFI) solutions have been proposed. Nevertheless, most prior work finds difficulties in serving two ends: performance and security. In particular, protecting forward edges, i.e., indirect calls, remains challenging to solve without trading off one for another. In this work, we propose Code-Pointer Tagging (CPT), a novel dynamic CFI solution combined with cryptographic protection. Our key observation is that a pointer's message authentication code (MAC) can be associated with the pointer's CFI label used for CFI checks. We find that such an approach not only enables a space-efficient control-flow graph (CFG) storage but also achieves highly-efficient CFI checks performed along with implicit pointer authentication. To enable CPT, we implement lightweight compiler and hardware support. We prototype our design in an FPGA-accelerated RISC-V hardware simulation platform and conduct full-system-level evaluations. Our results show that CPT incurs a 1.2% average slowdown on the SPEC CPU C/C++ benchmarks while providing effective layered hardening on forward-edge CFI.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49988597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast Performance Prediction for Efficient Distributed DNN Training 高效分布式DNN训练的快速性能预测
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-09-18 DOI: 10.1109/LCA.2023.3316452
Yugyoung Yun;Eunhyeok Park
{"title":"Fast Performance Prediction for Efficient Distributed DNN Training","authors":"Yugyoung Yun;Eunhyeok Park","doi":"10.1109/LCA.2023.3316452","DOIUrl":"https://doi.org/10.1109/LCA.2023.3316452","url":null,"abstract":"Training large-scale DNN models requires parallel distributed training using hyper-scale systems. To make the best use of the numerous accelerators, it is essential to intelligently combine different parallelization schemes. However, as the size of DNN models increases, the possible combinations of schemes become enormous, and consequently, finding the optimal parallel plan becomes exceedingly expensive and practically unfeasible. In this letter, we introduce a novel cost model, the Markovian Performance Estimator (MPE). This model provides affordable estimates of the throughput of various parallel plans, promoting efficient and fast searches for the ideal parallel plan, even when resources are limited. Significantly, this work is pioneering in explaining the expensive nature of searching for an optimal plan and addressing it using intuitive performance estimations based on real device evaluations. Our experiments demonstrate the effectiveness of the MPE, revealing that it accelerates the optimization process up to 126x faster (36.4 on average) than the existing state-of-the-art baseline, Alpa.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49993041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Balancing Performance Against Cost and Sustainability in Multi-Chip-Module GPUs 在多晶片模组gpu中平衡效能与成本与永续性
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-09-08 DOI: 10.1109/LCA.2023.3313203
Shiqing Zhang;Mahmood Naderan-Tahan;Magnus Jahre;Lieven Eeckhout
{"title":"Balancing Performance Against Cost and Sustainability in Multi-Chip-Module GPUs","authors":"Shiqing Zhang;Mahmood Naderan-Tahan;Magnus Jahre;Lieven Eeckhout","doi":"10.1109/LCA.2023.3313203","DOIUrl":"https://doi.org/10.1109/LCA.2023.3313203","url":null,"abstract":"MCM-GPUs scale performance by integrating multiple chiplets within the same package. How to partition the aggregate compute resources across chiplets poses a fundamental trade-off in performance versus cost and sustainability. We propose the \u0000<italic>Performance Per Wafer (PPW)</i>\u0000 metric to explore this trade-off and we find that while performance is maximized with few large chiplets, and while cost and environmental footprint is minimized with many small chiplets, the optimum balance is achieved with a moderate number of medium-sized chiplets. The optimum number of chiplets depends on the workload and increases with increased inter-chiplet bandwidth.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49962231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信