IEEE Computer Architecture Letters最新文献

筛选
英文 中文
Supporting a Virtual Vector Instruction Set on a Commercial Compute-in-SRAM Accelerator 在商用内存计算加速器上支持虚拟矢量指令集
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-12-11 DOI: 10.1109/LCA.2023.3341389
Courtney Golden;Dan Ilan;Caroline Huang;Niansong Zhang;Zhiru Zhang;Christopher Batten
{"title":"Supporting a Virtual Vector Instruction Set on a Commercial Compute-in-SRAM Accelerator","authors":"Courtney Golden;Dan Ilan;Caroline Huang;Niansong Zhang;Zhiru Zhang;Christopher Batten","doi":"10.1109/LCA.2023.3341389","DOIUrl":"https://doi.org/10.1109/LCA.2023.3341389","url":null,"abstract":"Recent work has explored compute-in-SRAM as a promising approach to overcome the traditional processor-memory performance gap. The recently released Associative Processing Unit (APU) from GSI Technology is, to our knowledge, the first commercial compute-in-SRAM accelerator. Prior work on this platform has focused on domain-specific acceleration using direct microcode programming and/or specialized libraries. In this letter, we demonstrate the potential for supporting a more general-purpose vector abstraction on the APU. We implement a virtual vector instruction set based on the recently proposed RISC-V Vector (RVV) extensions, analyze tradeoffs in instruction implementations, and perform detailed instruction microbenchmarking to identify performance benefits and overheads. This work is a first step towards general-purpose computing on domain-specific compute-in-SRAM accelerators.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139976194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploiting Intrinsic Redundancies in Dynamic Graph Neural Networks for Processing Efficiency 利用动态图神经网络的内在冗余提高处理效率
IF 1.4 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-12-07 DOI: 10.1109/LCA.2023.3340504
Deniz Gurevin;Caiwen Ding;Omer Khan
{"title":"Exploiting Intrinsic Redundancies in Dynamic Graph Neural Networks for Processing Efficiency","authors":"Deniz Gurevin;Caiwen Ding;Omer Khan","doi":"10.1109/LCA.2023.3340504","DOIUrl":"10.1109/LCA.2023.3340504","url":null,"abstract":"Modern dynamical systems are rapidly incorporating artificial intelligence to improve the efficiency and quality of complex predictive analytics. To efficiently operate on increasingly large datasets and intrinsically dynamic non-euclidean data structures, the computing community has turned to Graph Neural Networks (GNNs). We make a key observation that existing GNN processing frameworks do not efficiently handle the intrinsic dynamics in modern GNNs. The dynamic processing of GNN operates on the complete static graph at each time step, leading to repetitive redundant computations that introduce tremendous under-utilization of system resources. We propose a novel dynamic graph neural network (DGNN) processing framework that captures the dynamically evolving dataflow of the GNN semantics, i.e., graph embeddings and sparse connections between graph nodes. The framework identifies intrinsic redundancies in node-connections and captures representative node-sparse graph information that is readily ingested for processing by the system. Our evaluation on an NVIDIA GPU shows up to 3.5× speedup over the baseline setup that processes all nodes at each time step.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142183932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing the Reach and Reliability of Quantum Annealers by Pruning Longer Chains 通过修剪长链提高量子退火器的覆盖范围和可靠性
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-12-06 DOI: 10.1109/LCA.2023.3340030
Ramin Ayanzadeh;Moinuddin Qureshi
{"title":"Enhancing the Reach and Reliability of Quantum Annealers by Pruning Longer Chains","authors":"Ramin Ayanzadeh;Moinuddin Qureshi","doi":"10.1109/LCA.2023.3340030","DOIUrl":"https://doi.org/10.1109/LCA.2023.3340030","url":null,"abstract":"Analog Quantum Computers (QCs), such as D-Wave's \u0000<italic>Quantum Annealers</i>\u0000 (\u0000<italic>QAs</i>\u0000) and QuEra's neutral atom platform, rival their digital counterparts in computing power. Existing QAs boast over 5,700 qubits, but their single-instruction operation model prevents using SWAP operations for making physically distant qubits adjacent. Instead, QAs use an \u0000<italic>embedding</i>\u0000 process to chain multiple \u0000<italic>physical qubits</i>\u0000 together, representing a \u0000<italic>program qubit</i>\u0000 with higher connectivity and reducing effective QA capacity by up to 33x. We observe that, post-embedding, nearly 25% of physical qubits remain unused, becoming trapped between chains. Additionally, we observe a “Power-Law” distribution in the chain lengths, where a few \u0000<italic>dominant chains</i>\u0000 possess significantly more qubits, thereby exerting a considerably more significant impact on both qubit utilization and isolation. Leveraging these insights, we propose \u0000<italic>Skipper</i>\u0000, a software technique designed to enhance the capacity and fidelity of QAs by skipping dominant chains and substituting their program qubit with two measurement outcomes. Using a 5761-qubit QA, we observed that by skipping up to eleven chains, the capacity increased by up to 59% (avg 28%), and the error decreased by up to 44% (avg 33%).","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139976212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tulip: Turn-Free Low-Power Network-on-Chip 郁金香免转低功耗片上网络
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-12-05 DOI: 10.1109/LCA.2023.3339646
Atiyeh Gheibi-Fetrat;Negar Akbarzadeh;Shaahin Hessabi;Hamid Sarbazi-Azad
{"title":"Tulip: Turn-Free Low-Power Network-on-Chip","authors":"Atiyeh Gheibi-Fetrat;Negar Akbarzadeh;Shaahin Hessabi;Hamid Sarbazi-Azad","doi":"10.1109/LCA.2023.3339646","DOIUrl":"https://doi.org/10.1109/LCA.2023.3339646","url":null,"abstract":"The semiconductor industry has seen significant technological advancements, leading to an increase in the number of processing cores in a system-on-chip (SoC). To facilitate communication among the numerous on-chip cores, a network-on-chip (NoC) is employed. One of the main challenges of designing NoCs is power management since the NoC consumes a significant portion of the total power of the SoC. Among the power-intensive components of the NoC, routers stand out. We observe that some power-intensive components of routers, responsible for implementing turn in the mesh topology, are underutilized compared to others. Therefore, we propose Tulip, a turn-free low-power network-in-chip, that avoids within-router turns by removing the corresponding components from the router structure. On a turn (e.g., at the end of the current dimension), Tulip forces the packet to be ejected and then reinjects it to the next dimension channel (i.e., the beginning of the path along the next dimension). Due to its deadlock-free nature, Tulip's scheme may be used orthogonally with any deterministic, partially-adaptive, and fully-adaptive routing algorithms, and can easily be extended for any n-dimensional mesh topology. Our analysis reveals that Tulip can reduce the static power and area by 24%−50% and 25%-55%, respectively, for 2D-5D mesh routers.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139060173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FPGA-Accelerated Data Preprocessing for Personalized Recommendation Systems 用于个性化推荐系统的 FPGA 加速数据预处理
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-11-28 DOI: 10.1109/LCA.2023.3336841
Hyeseong Kim;Yunjae Lee;Minsoo Rhu
{"title":"FPGA-Accelerated Data Preprocessing for Personalized Recommendation Systems","authors":"Hyeseong Kim;Yunjae Lee;Minsoo Rhu","doi":"10.1109/LCA.2023.3336841","DOIUrl":"https://doi.org/10.1109/LCA.2023.3336841","url":null,"abstract":"Deep neural network (DNN)-based recommendation systems (RecSys) are one of the most successfully deployed machine learning applications in commercial services for predicting ad click-through rates or rankings. While numerous prior work explored hardware and software solutions to reduce the training time of RecSys, its end-to-end training pipeline including the data preprocessing stage has received little attention. In this work, we provide a comprehensive analysis of RecSys data preprocessing, root-causing the feature generation and normalization steps to cause a major performance bottleneck. Based on our characterization, we explore the efficacy of an FPGA-accelerated RecSys preprocessing system that achieves a significant 3.4–12.1× end-to-end speedup compared to the baseline CPU-based RecSys preprocessing system.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139504430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Redundant Array of Independent Memory Devices 独立内存设备冗余阵列
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-11-20 DOI: 10.1109/LCA.2023.3334989
Peiyun Wu;Trung Le;Zhichun Zhu;Zhao Zhang
{"title":"Redundant Array of Independent Memory Devices","authors":"Peiyun Wu;Trung Le;Zhichun Zhu;Zhao Zhang","doi":"10.1109/LCA.2023.3334989","DOIUrl":"https://doi.org/10.1109/LCA.2023.3334989","url":null,"abstract":"DRAM memory reliability is increasingly a concern as recent studies found. In this letter, we propose RAIMD (Redundant Array of Independent Memory Devices), an energy-efficient memory organization with RAID-like error protection. In this organization, each memory device works as an independent memory module to serve a whole memory request and to support error detection and error recovery. It relies on the high data rate of modern memory device to minimize the performance impact of increased data transfer time. RAIMD provides chip-level error protection similar to Chipkill but with significant energy savings. Our simulation results indicate that RAIMD can save memory energy by 26.3% on average with a small performance overhead of 5.3% on DDR5-4800 memory systems for SPEC2017 multi-core workloads.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138633920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ramulator 2.0: A Modern, Modular, and Extensible DRAM Simulator Ramulator 2.0:现代、模块化、可扩展的 DRAM 仿真器
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-11-17 DOI: 10.1109/LCA.2023.3333759
Haocong Luo;Yahya Can Tuğrul;F. Nisa Bostancı;Ataberk Olgun;A. Giray Yağlıkçı;Onur Mutlu
{"title":"Ramulator 2.0: A Modern, Modular, and Extensible DRAM Simulator","authors":"Haocong Luo;Yahya Can Tuğrul;F. Nisa Bostancı;Ataberk Olgun;A. Giray Yağlıkçı;Onur Mutlu","doi":"10.1109/LCA.2023.3333759","DOIUrl":"https://doi.org/10.1109/LCA.2023.3333759","url":null,"abstract":"We present Ramulator 2.0, a highly modular and extensible DRAM simulator that enables rapid and agile implementation and evaluation of design changes in the memory controller and DRAM to meet the increasing research effort in improving the performance, security, and reliability of memory systems. Ramulator 2.0 abstracts and models key components in a DRAM-based memory system and their interactions into shared \u0000<italic>interfaces</i>\u0000 and independent \u0000<italic>implementations</i>\u0000. Doing so enables easy modification and extension of the modeled functions of the memory controller and DRAM in Ramulator 2.0. The DRAM specification syntax of Ramulator 2.0 is concise and human-readable, facilitating easy modifications and extensions. Ramulator 2.0 implements a library of reusable templated lambda functions to model the functionalities of DRAM commands to simplify the implementation of new DRAM standards, including DDR5, LPDDR5, HBM3, and GDDR6. We showcase Ramulator 2.0's modularity and extensibility by implementing and evaluating a wide variety of RowHammer mitigation techniques that require \u0000<italic>different</i>\u0000 memory controller design changes. These techniques are added modularly as separate implementations \u0000<italic>without</i>\u0000 changing \u0000<italic>any</i>\u0000 code in the baseline memory controller implementation. Ramulator 2.0 is rigorously validated and maintains a fast simulation speed compared to existing cycle-accurate DRAM simulators.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140822019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards an Accelerator for Differential and Algebraic Equations Useful to Scientists 开发对科学家有用的微分方程和代数方程加速器
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-11-13 DOI: 10.1109/LCA.2023.3332318
Jonathan Garcia-Mallen;Shuohao Ping;Alex Miralles-Cordal;Ian Martin;Mukund Ramakrishnan;Yipeng Huang
{"title":"Towards an Accelerator for Differential and Algebraic Equations Useful to Scientists","authors":"Jonathan Garcia-Mallen;Shuohao Ping;Alex Miralles-Cordal;Ian Martin;Mukund Ramakrishnan;Yipeng Huang","doi":"10.1109/LCA.2023.3332318","DOIUrl":"10.1109/LCA.2023.3332318","url":null,"abstract":"We discuss our preliminary results in building a configurable accelerator for differential equation time stepping and iterative methods for algebraic equations. Relative to prior efforts in building hardware accelerators for numerical methods, our focus is on the following: 1) Demonstrating a higher order of numerical convergence that is needed to actually support existing numerical algorithms. 2) Providing the capacity for wide vectors of variables by keeping the hardware design components as simple as possible. 3) Demonstrating configurable hardware support for a variety of numerical algorithms that form the core of scientific computation libraries. These efforts are toward the goal of making the accelerator democratically accessible by computational scientists.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135612504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
gem5-accel: A Pre-RTL Simulation Toolchain for Accelerator Architecture Validation gem5-accel:用于加速器架构验证的预 RTL 仿真工具链
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-11-01 DOI: 10.1109/LCA.2023.3329443
João Vieira;Nuno Roma;Gabriel Falcao;Pedro Tomás
{"title":"gem5-accel: A Pre-RTL Simulation Toolchain for Accelerator Architecture Validation","authors":"João Vieira;Nuno Roma;Gabriel Falcao;Pedro Tomás","doi":"10.1109/LCA.2023.3329443","DOIUrl":"10.1109/LCA.2023.3329443","url":null,"abstract":"Attaining the performance and efficiency levels required by modern applications often requires the use of application-specific accelerators. However, writing synthesizable Register-Transfer Level code for such accelerators is a complex, expensive, and time-consuming process, which is cumbersome for early architecture development phases. To tackle this issue, a pre-synthesis simulation toolchain is herein proposed that facilitates the early architectural evaluation of complex accelerators aggregated to multi-level memory hierarchies. To demonstrate its usefulness, the proposed gem5-accel is used to model a tensor accelerator based on Gemmini, showing that it can successfully anticipate the results of complex hardware accelerators executing deep Neural Networks.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135361792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reducing the Silicon Area Overhead of Counter-Based Rowhammer Mitigations 减少基于计数器的行锤缓解措施的硅面积开销
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-10-31 DOI: 10.1109/LCA.2023.3328824
Loïc France;Florent Bruguier;David Novo;Maria Mushtaq;Pascal Benoit
{"title":"Reducing the Silicon Area Overhead of Counter-Based Rowhammer Mitigations","authors":"Loïc France;Florent Bruguier;David Novo;Maria Mushtaq;Pascal Benoit","doi":"10.1109/LCA.2023.3328824","DOIUrl":"10.1109/LCA.2023.3328824","url":null,"abstract":"Modern computer memories have shown to have reliability issues. The main memory is the target of a security threat called Rowhammer, which causes bit flips in adjacent victim cells of aggressor rows. Numerous countermeasures have been proposed, some of the most efficient ones relying on row access counters, with different techniques to reduce the impact on performance, energy consumption and silicon area. In these proposals, the number of counters is calculated using the maximum number of row activations that can be issued to the protected bank. As reducing the number of counters results in lower silicon area and energy overheads, this can have a direct impact on the production and usage costs. In this work, we demonstrate that two of the most efficient countermeasures can have their silicon area overhead reduced by approximately 50% without impacting the protection level by changing their counting granularity.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135263777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信