IEEE Computer Architecture Letters最新文献

筛选
英文 中文
Hardware-Implemented Lightweight Accelerator for Large Integer Polynomial Multiplication 大整数多项式乘法的硬件实现轻量级加速器
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-03-10 DOI: 10.1109/LCA.2023.3274931
Pengzhou He;Yazheng Tu;Çetin Kaya Koç;Jiafeng Xie
{"title":"Hardware-Implemented Lightweight Accelerator for Large Integer Polynomial Multiplication","authors":"Pengzhou He;Yazheng Tu;Çetin Kaya Koç;Jiafeng Xie","doi":"10.1109/LCA.2023.3274931","DOIUrl":"10.1109/LCA.2023.3274931","url":null,"abstract":"Large integer polynomial multiplication is frequently used as a key component in post-quantum cryptography (PQC) algorithms. Following the trend that efficient hardware implementation for PQC is emphasized, in this letter, we propose a new hardware-implemented lightweight accelerator for the large integer polynomial multiplication of Saber (one of the National Institute of Standards and Technology third-round finalists). First, we provided a derivation process to obtain the algorithm for the targeted polynomial multiplication. Then, the proposed algorithm is mapped into an optimized hardware accelerator. Finally, we demonstrated the efficiency of the proposed design, e.g., this accelerator with \u0000<inline-formula><tex-math>$v=32$</tex-math></inline-formula>\u0000 has at least 48.37% less area-delay product (ADP) than the existing designs. The outcome of this work is expected to provide useful references for efficient implementation of other PQC.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47266691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
In-Memory Versioning (IMV) 内存版本控制(IMV)
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-03-05 DOI: 10.1109/LCA.2023.3273124
David Andrew Roberts;Haojie Ye;Tony Brewer;Sean Eilert
{"title":"In-Memory Versioning (IMV)","authors":"David Andrew Roberts;Haojie Ye;Tony Brewer;Sean Eilert","doi":"10.1109/LCA.2023.3273124","DOIUrl":"10.1109/LCA.2023.3273124","url":null,"abstract":"In this letter, we propose and evaluate designs for a novel hardware-assisted data versioning system (in-memory versioning or IMV) in the context of high-performance computing. Our main novelty and advantage over recent published work is that it does not require any changes to host processor logic, instead augmenting a memory controller within memory modules. It is faster and more efficient than existing high-performance computing (HPC) checkpointing schemes and works from hours to sub-second checkpoint intervals. The main premise is to perform most operations in hardware at cache-line granularity, avoiding operating system (OS) latency and page copying bandwidth overhead. Energy is saved by keeping data movement in the memory module, compared with page granularity cross channel or cross-network copying that is currently used. For a 1-second checkpoint commit interval, we demonstrate up to 20x checkpoint performance and 70x energy savings using IMV versus page copy-on-write (COW).","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49519371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Energy-Efficient Bayesian Inference Using Bitstream Computing 基于比特流计算的节能贝叶斯推理
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-02-14 DOI: 10.1109/LCA.2023.3238584
Soroosh Khoram;Kyle Daruwalla;Mikko Lipasti
{"title":"Energy-Efficient Bayesian Inference Using Bitstream Computing","authors":"Soroosh Khoram;Kyle Daruwalla;Mikko Lipasti","doi":"10.1109/LCA.2023.3238584","DOIUrl":"10.1109/LCA.2023.3238584","url":null,"abstract":"Uncertainty quantification is critical to many machine learning applications especially in mobile and edge computing tasks like self-driving cars, robots, and mobile devices. Bayesian Neural Networks can be used to provide these uncertainty quantifications but they come at extra computation costs. However, power and energy can be limited at the edge. In this work, we propose using stochastic bitstream computing substrates for deploying BNNs which can significantly reduce power and costs. We design our Bayesian Bitstream Processor hardware for an audio classification task as a test case and show that it can outperform a micro-controller baseline in energy by two orders of magnitude and delay by an order of magnitude, at lower power.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47107644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Intelligent SSD Firmware for Zero-Overhead Journaling 用于零开销日志的智能SSD固件
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-02-09 DOI: 10.1109/LCA.2023.3243695
Hanyeoreum Bae;Donghyun Gouk;Seungjun Lee;Jiseon Kim;Sungjoon Koh;Jie Zhang;Myoungsoo Jung
{"title":"Intelligent SSD Firmware for Zero-Overhead Journaling","authors":"Hanyeoreum Bae;Donghyun Gouk;Seungjun Lee;Jiseon Kim;Sungjoon Koh;Jie Zhang;Myoungsoo Jung","doi":"10.1109/LCA.2023.3243695","DOIUrl":"10.1109/LCA.2023.3243695","url":null,"abstract":"We propose Check0-SSD, an intelligent SSD firmware to offer the best system-level fault-tolerance without performance degradation and lifetime shortening. Specifically, the SSD firmware autonomously removes transaction checkpointing, which eliminates redundant writes to the flash backend. To this end, Check0-SSD dynamically classifies journal descriptor/commit requests at runtime and switches the address spaces between journal and data regions by examining the host's filesystem layout and journal region information in a self-governing manner. Our evaluations demonstrate that Check0-SSD can protect both data and metadata with 89% enhanced storage lifetime while exhibiting similar or even better performance compared to the norecovery SSD.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49254659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Last-Level Cache Insertion and Promotion Policy in the Presence of Aggressive Prefetching 存在野蛮预取时的最后一级缓存插入和提升策略
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-02-03 DOI: 10.1109/LCA.2023.3242178
Daniel A. Jiménez;Elvira Teran;Paul V. Gratz
{"title":"Last-Level Cache Insertion and Promotion Policy in the Presence of Aggressive Prefetching","authors":"Daniel A. Jiménez;Elvira Teran;Paul V. Gratz","doi":"10.1109/LCA.2023.3242178","DOIUrl":"10.1109/LCA.2023.3242178","url":null,"abstract":"The last-level cache (LLC) is the last chance for memory accesses from the processor to avoid the costly latency of going to main memory. LLC management has been the topic of intense research focusing on two main techniques: replacement and prefetching. However, these two ideas are often evaluated separately, with one being studied outside the context of the state-of-the-art in the other. We find that high-performance replacement and highly accurate pattern-based prefetching do not result in synergistic improvements in performance. The overhead of complex replacement policies is wasted in the presence of aggressive prefetchers. We find that a simple replacement policy with minimal overhead provides at least the same benefit as a state-of-the-art replacement policy in the presence of aggressive pattern-based prefetching. Our proposal is based on the idea of using a genetic algorithm to search the space of insertion and promotion policies that generalize transitions in the recency stack for the least-recently-used policy.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45606044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
ADT: Aggressive Demotion and Promotion for Tiered Memory ADT:分层内存的积极降级和升级
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-01-13 DOI: 10.1109/LCA.2023.3236685
Yaebin Moon;Wanju Doh;Kwanhee Kyung;Eojin Lee;Jung Ho Ahn
{"title":"ADT: Aggressive Demotion and Promotion for Tiered Memory","authors":"Yaebin Moon;Wanju Doh;Kwanhee Kyung;Eojin Lee;Jung Ho Ahn","doi":"10.1109/LCA.2023.3236685","DOIUrl":"10.1109/LCA.2023.3236685","url":null,"abstract":"Tiered memory using DRAM as upper-tier (fast memory) and emerging slower-but-larger byte-addressable memory as lower-tier (slow memory) is a promising approach to expanding main-memory capacity. Based on the observation that there are many cold pages in data-center applications, \u0000<italic>proactive demotion</i>\u0000 schemes demote cold pages to slow memory even when free space in fast memory is not deficient. Prior works on proactive demotion lower the requirement of expensive fast-memory capacity by reducing applications’ resident set size in fast memory. Also, some of the prior works mitigate the massive performance drop due to insufficient fast-memory capacity when there is a spike in demand for hot data. However, there is room for further improvement to save a larger fast-memory capacity with further aggressive demotion, which can fully reap the aforementioned advantages of proactive demotion. In this paper, we propose a new proactive demotion scheme, ADT, which performs \u0000<bold>a</b>\u0000ggressive \u0000<bold>d</b>\u0000emotion and promotion for \u0000<bold>t</b>\u0000iered memory. Using the memory access locality within the unit in which applications and memory allocators allocate memory, ADT extends the unit of demotion/promotion from the page adopted by prior works to make its demotion more aggressive. By performing demotion and promotion by the extended unit, ADT reduces 29% of fast-memory usage with only a 2.3% performance drop. Also, it achieves 2.28× speedup compared to the default Linux kernel when the system's memory usage is larger than fast-memory capacity, which outperforms state-of-the-art schemes for tiered memory management.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43125075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HAMMER: Hardware-Friendly Approximate Computing for Self-Attention With Mean-Redistribution And Linearization HAMMER:具有均值重分布和线性化的自注意硬件友好近似计算
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-01-04 DOI: 10.1109/LCA.2022.3233832
Seonho Lee;Ranggi Hwang;Jongse Park;Minsoo Rhu
{"title":"HAMMER: Hardware-Friendly Approximate Computing for Self-Attention With Mean-Redistribution And Linearization","authors":"Seonho Lee;Ranggi Hwang;Jongse Park;Minsoo Rhu","doi":"10.1109/LCA.2022.3233832","DOIUrl":"10.1109/LCA.2022.3233832","url":null,"abstract":"The recent advancement of the natural language processing (NLP) models is the result of the ever-increasing model size and datasets. Most of these modern NLP models adopt the Transformer based model architecture, whose main bottleneck is exhibited in the self-attention mechanism. As the computation required for self-attention increases rapidly as the model size gets larger, self-attentions have been the main challenge for deploying NLP models. Consequently, there are several prior works which sought to address this bottleneck, but most of them suffer from significant design overheads and additional training requirements. In this work, we propose HAMMER, hardware-friendly approximate computing solution for self-attentions employing mean-redistribution and linearization, which effectively increases the performance of self-attention mechanism with low overheads. Compared to previous state-of-the-art self-attention accelerators, HAMMER improves performance by \u0000<inline-formula><tex-math>$1.2-1.6times$</tex-math></inline-formula>\u0000 and energy efficiency by \u0000<inline-formula><tex-math>$1.2-1.5times$</tex-math></inline-formula>\u0000.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47478009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancing Compilation of DNNs for FPGAs Using Operation Set Architectures 基于运算集架构的fpga深度神经网络的超前编译
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2022-12-13 DOI: 10.1109/LCA.2022.3227643
Burkhard Ringlein;Francois Abel;Dionysios Diamantopoulos;Beat Weiss;Christoph Hagleitner;Dietmar Fey
{"title":"Advancing Compilation of DNNs for FPGAs Using Operation Set Architectures","authors":"Burkhard Ringlein;Francois Abel;Dionysios Diamantopoulos;Beat Weiss;Christoph Hagleitner;Dietmar Fey","doi":"10.1109/LCA.2022.3227643","DOIUrl":"10.1109/LCA.2022.3227643","url":null,"abstract":"The slow-down of technology scaling combined with the exponential growth of modern machine learning and artificial intelligence models has created a demand for specialized accelerators, such as GPUs, ASICs, and field-programmable gate arrays (FPGAs). FPGAs can be reconfigured and have the potential to outperform other accelerators, while also being more energy-efficient, but are cumbersome to use with today's fractured landscape of tool flows. We propose the concept of an operation set architecture to overcome the current incompatibilities and hurdles in using DNN-to-FPGA compilers by combining existing specialized frameworks into one organic compiler that also allows the efficient and automatic re-use of existing community tools. Furthermore, we demonstrate that mixing different existing frameworks can increase the efficiency by more than an order of magnitude.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41745751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
CoreNap: Energy Efficient Core Allocation for Latency-Critical Workloads CoreNap:用于潜在关键工作负载的节能核心分配
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2022-12-08 DOI: 10.1109/LCA.2022.3227629
Gyeongseo Park;Ki-Dong Kang;Minho Kim;Daehoon Kim
{"title":"CoreNap: Energy Efficient Core Allocation for Latency-Critical Workloads","authors":"Gyeongseo Park;Ki-Dong Kang;Minho Kim;Daehoon Kim","doi":"10.1109/LCA.2022.3227629","DOIUrl":"10.1109/LCA.2022.3227629","url":null,"abstract":"In data-center servers, the dynamic core allocation for Latency-Critical (LC) applications can play a crucial role in improving energy efficiency under Service Level Objective (SLO) constraints, allowing cores to enter idle states (i.e., C-states) that consume less power by turning off a part of hardware components of a processor. However, prior studies focus on the core allocation for application threads while not considering cores involved in network packet processing, even though packet processing affects not only response latency but also energy consumption considerably. In this paper, we first investigate the impacts of the explicit core allocation for network packet processing on the tail response latency and energy consumption while running LC applications. We observe that co-adjusting the number of cores for network packet processing along with the number of cores for LC application threads can improve energy efficiency substantially, compared with adjusting the number of cores only for application threads, as prior studies do. In addition, we propose a dynamic core allocation, called \u0000<monospace>CoreNap</monospace>\u0000, which allocates/de-allocates cores for both LC application threads and packet processing. \u0000<monospace>CoreNap</monospace>\u0000 measures the CPU-utilization by application threads and packet processing individually, and predicts response latency and power consumption when the combination of core allocation is enforced via a lightweight prediction model. Based on the prediction, \u0000<monospace>CoreNap</monospace>\u0000 chooses/enforces the energy-efficient combination of core allocation. Our experimental results show that \u0000<monospace>CoreNap</monospace>\u0000 reduces energy consumption by up to 18.6% compared with state-of-the-art study that adjusts cores only for LC application in parallel packet processing environments.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42616181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computational CXL-Memory Solution for Accelerating Memory-Intensive Applications 用于加速内存密集型应用的计算CXL内存解决方案
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2022-12-05 DOI: 10.1109/LCA.2022.3226482
Joonseop Sim;Soohong Ahn;Taeyoung Ahn;Seungyong Lee;Myunghyun Rhee;Jooyoung Kim;Kwangsik Shin;Donguk Moon;Euiseok Kim;Kyoung Park
{"title":"Computational CXL-Memory Solution for Accelerating Memory-Intensive Applications","authors":"Joonseop Sim;Soohong Ahn;Taeyoung Ahn;Seungyong Lee;Myunghyun Rhee;Jooyoung Kim;Kwangsik Shin;Donguk Moon;Euiseok Kim;Kyoung Park","doi":"10.1109/LCA.2022.3226482","DOIUrl":"10.1109/LCA.2022.3226482","url":null,"abstract":"CXL interface is the up-to-date technology that enables effective memory expansion by providing a memory-sharing protocol in configuring heterogeneous devices. However, its limited physical bandwidth can be a significant bottleneck for emerging data-intensive applications. In this work, we propose a novel CXL-based memory disaggregation architecture with a real-world prototype demonstration, which overcomes the bandwidth limitation of the CXL interface using near-data processing. The experimental results demonstrate that our design achieves up to 1.9× better performance/power efficiency than the existing CPU system.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48354577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信