IEEE Computer Architecture Letters最新文献

筛选
英文 中文
LV: Latency-Versatile Floating-Point Engine for High-Performance Deep Neural Networks LV:用于高性能深度神经网络的潜伏通用浮点引擎
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-08-25 DOI: 10.1109/LCA.2023.3287096
Yun-Chen Lo;Yu-Chih Tsai;Ren-Shuo Liu
{"title":"LV: Latency-Versatile Floating-Point Engine for High-Performance Deep Neural Networks","authors":"Yun-Chen Lo;Yu-Chih Tsai;Ren-Shuo Liu","doi":"10.1109/LCA.2023.3287096","DOIUrl":"10.1109/LCA.2023.3287096","url":null,"abstract":"Computing latency is an important system metric for Deep Neural Networks (DNNs) accelerators. To reduce latency, this work proposes \u0000<bold>LV</b>\u0000, a latency-versatile floating-point engine (FP-PE), which contains the following key contributions: 1) an approximate bit-versatile multiplier-and-accumulate (BV-MAC) unit with early shifter and 2) an on-demand fixed-point-to-floating-point conversion (FXP2FP) unit. The extensive experimental results show that LV outperforms baseline FP-PE and redundancy-aware FP-PE by up to 2.12× and 1.3× speedup using TSMC 40-nm technology, achieving comparable accuracy on the ImageNet classification tasks.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44362022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Flexible Embedding-Aware Near Memory Processing Architecture for Recommendation System 一种灵活的基于嵌入感知的推荐系统近记忆处理架构
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-08-16 DOI: 10.1109/LCA.2023.3305668
Lingfei Lu;Yudi Qiu;Shiyan Yi;Yibo Fan
{"title":"A Flexible Embedding-Aware Near Memory Processing Architecture for Recommendation System","authors":"Lingfei Lu;Yudi Qiu;Shiyan Yi;Yibo Fan","doi":"10.1109/LCA.2023.3305668","DOIUrl":"10.1109/LCA.2023.3305668","url":null,"abstract":"Personalized recommendation system (RS) is widely used in the industrial community and occupies much time in AI computing centers. A critical component of RS is the embedding layer, which consists of sparse embedding lookups and is memory-bounded. Recent works have proposed near-memory processing (NMP) architectures to utilize high inner-memory bandwidth to speed up embedding lookups. These NMP works divide embedding vectors either horizontally or vertically. However, the effectiveness of horizontal or vertical partitioning is hard to guarantee under different memory configurations or embedding vector sizes. To improve this issue, we propose FeaNMP, a \u0000<underline>f</u>\u0000lexible \u0000<underline>e</u>\u0000mbedding-\u0000<underline>a</u>\u0000ware \u0000<underline>NMP</u>\u0000 architecture that accelerates the inference phase of RS. We explore different partitioning strategies in detail and design a flexible way to select optimal ones depending on different embedding dimensions and DDR configurations. As a result, compared to the state-of-the-art rank-level NMP work RecNMP, our work achieves up to 11.1× speedup for embedding layers under mix-dimensioned workloads.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136139267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unleashing the Potential of PIM: Accelerating Large Batched Inference of Transformer-Based Generative Models 释放PIM的潜力:加速基于变压器的生成模型的大批量推理
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-08-15 DOI: 10.1109/LCA.2023.3305386
Jaewan Choi;Jaehyun Park;Kwanhee Kyung;Nam Sung Kim;Jung Ho Ahn
{"title":"Unleashing the Potential of PIM: Accelerating Large Batched Inference of Transformer-Based Generative Models","authors":"Jaewan Choi;Jaehyun Park;Kwanhee Kyung;Nam Sung Kim;Jung Ho Ahn","doi":"10.1109/LCA.2023.3305386","DOIUrl":"10.1109/LCA.2023.3305386","url":null,"abstract":"Transformer-based generative models, such as GPT, summarize an input sequence by generating key/value (KV) matrices through attention and generate the corresponding output sequence by utilizing these matrices once per token of the sequence. Both input and output sequences tend to get longer, which improves the understanding of contexts and conversation quality. These models are also typically batched for inference to improve the serving throughput. All these trends enable the models’ weights to be reused effectively, increasing the relative importance of sequence generation, especially in processing KV matrices through attention. We identify that the conventional computing platforms (e.g., GPUs) are not efficient at handling this attention part for inference because each request generates different KV matrices, it has a low operation per byte ratio regardless of the batch size, and the aggregate size of the KV matrices can even surpass that of the entire model weights. This motivates us to propose AttAcc, which exploits the fact that the KV matrices are written once during summarization but used many times (proportional to the output sequence length), each multiplied by the embedding vector corresponding to an output token. The volume of data entering/leaving AttAcc could be more than orders of magnitude smaller than what should be read internally for attention. We design AttAcc with multiple processing-in-memory devices, each multiplying the embedding vector with the portion of the KV matrices within the devices, saving external (inter-device) bandwidth and energy consumption.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/10208/10189818/10218731.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49570973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Characterizing and Understanding Defense Methods for GNNs on GPUs gpu上gnn防御方法的表征与理解
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-08-15 DOI: 10.1109/LCA.2023.3304638
Meng Wu;Mingyu Yan;Xiaocheng Yang;Wenming Li;Zhimin Zhang;Xiaochun Ye;Dongrui Fan
{"title":"Characterizing and Understanding Defense Methods for GNNs on GPUs","authors":"Meng Wu;Mingyu Yan;Xiaocheng Yang;Wenming Li;Zhimin Zhang;Xiaochun Ye;Dongrui Fan","doi":"10.1109/LCA.2023.3304638","DOIUrl":"10.1109/LCA.2023.3304638","url":null,"abstract":"Graph neural networks (GNNs) are widely deployed in many vital fields, but suffer from adversarial attacks, which seriously compromise the security in these fields. Plenty of defense methods have been proposed to mitigate the impact of these attacks, however, they have introduced extra time-consuming stages into the execution of GNNs. These extra stages need to be accelerated because the end-to-end acceleration is essential for GNNs to achieve fast development and deployment. To disclose the performance bottlenecks, execution patterns, execution semantics, and overheads of the defense methods for GNNs, we characterize and explore these extra stages on GPUs. Given the characterization and exploration, we provide several useful guidelines for both software and hardware optimizations to accelerate the defense methods for GNNs.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44243157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
By-Software Branch Prediction in Loops 循环中的软件分支预测
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-08-11 DOI: 10.1109/LCA.2023.3304613
Maziar Goudarzi;Reza Azimi;Julian Humecki;Faizaan Rehman;Richard Zhang;Chirag Sethi;Tanishq Bomman;Yuqi Yang
{"title":"By-Software Branch Prediction in Loops","authors":"Maziar Goudarzi;Reza Azimi;Julian Humecki;Faizaan Rehman;Richard Zhang;Chirag Sethi;Tanishq Bomman;Yuqi Yang","doi":"10.1109/LCA.2023.3304613","DOIUrl":"https://doi.org/10.1109/LCA.2023.3304613","url":null,"abstract":"Load-Dependent Branches (LDB) often do not exhibit regular patterns in their local or global history and thus are inherently hard to predict correctly by conventional branch predictors. We propose a software-to-hardware branch pre-resolution mechanism that allows software to pass branch outcomes to the processor frontend ahead of fetching the branch instruction. A compiler pass identifies the instruction chain leading to the branch (the branch \u0000<italic>backslice</i>\u0000) and generates the pre-execute code that produces the branch outcomes ahead of the frontend observing them. The loop structure helps to unambiguously map the branch outcomes to their corresponding dynamic instances of the branch instruction. Our approach also allows for covering the loop iteration space selectively, with arbitrarily complex patterns. Our method for pre-execution enables important optimizations such as unrolling and vectorization, in order to substantially reduce the pre-execution overhead. Experimental results on select workloads from SPEC CPU 2017 and graph analytics workloads show up to 95% reduction of MPKI (21% on average), up to 39% speedup (7% on average), and 23% IPC gain on average, compared to a core with TAGE-SC-L-64KB branch predictor.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49993042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simulating Our Way to Safer Software: A Tale of Integrating Microarchitecture Simulation and Leakage Estimation Modeling 模拟我们的安全软件之路:集成微体系结构模拟和泄漏估计建模的故事
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-08-10 DOI: 10.1109/LCA.2023.3303913
Justin Feng;Fatemeh Arkannezhad;Christopher Ryu;Enoch Huang;Siddhant Gupta;Nader Sehatbakhsh
{"title":"Simulating Our Way to Safer Software: A Tale of Integrating Microarchitecture Simulation and Leakage Estimation Modeling","authors":"Justin Feng;Fatemeh Arkannezhad;Christopher Ryu;Enoch Huang;Siddhant Gupta;Nader Sehatbakhsh","doi":"10.1109/LCA.2023.3303913","DOIUrl":"10.1109/LCA.2023.3303913","url":null,"abstract":"An important step to protect software against side-channel vulnerability is to rigorously evaluate it on the target hardware using standard leakage tests. Recently, leakage estimation tools have received a lot of attention to improve this time-consuming process. Despite their advancements, existing tools often neglect the impact of microarchitecture and its underlying events in their leakage model which leads to inaccuracies. This paper takes the first step in addressing these issues by integrating a physical side-channel leakage estimation tool into a microarchitectural simulator. To achieve this, we first systematically explore the impact of various architecture and microarchitecture activities and their underlying interactions on the produced physical side-channel signals and integrate that into the microarchitecture model. Second, to create a comprehensive leakage estimation report, we leverage taint tracking and symbolic execution to accurately analyze different paths and inputs. The final outcome of this work is a tool that takes a binary and generates a leakage report that covers architecture and microarchitecture-related leakages for both data-dependent and path-dependent information leakage scenarios.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41489016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SoCurity: A Design Approach for Enhancing SoC Security SoCurity:一种增强SoC安全性的设计方法
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-08-03 DOI: 10.1109/LCA.2023.3301448
Naorin Hossain;Alper Buyuktosunoglu;John-David Wellman;Pradip Bose;Margaret Martonosi
{"title":"SoCurity: A Design Approach for Enhancing SoC Security","authors":"Naorin Hossain;Alper Buyuktosunoglu;John-David Wellman;Pradip Bose;Margaret Martonosi","doi":"10.1109/LCA.2023.3301448","DOIUrl":"10.1109/LCA.2023.3301448","url":null,"abstract":"We propose SoCurity, the first NoC counter-based hardware monitoring approach for enhancing heterogeneous SoC security. With SoCurity, we develop a fast, lightweight anomalous activity detection system leveraging semi-supervised machine learning models that require no prior attack knowledge for detecting anomalies. We demonstrate our techniques with a case study on a real SoC for a connected autonomous vehicle system and find up to 96% detection accuracy.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42292159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Smart Memory: Deep Learning Acceleration in 3D-Stacked Memories 智能存储器:在三维堆叠存储器中加速深度学习
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-08-01 DOI: 10.1109/LCA.2023.3287976
Seyyed Hossein SeyyedAghaei Rezaei;Parham Zilouchian Moghaddam;Mehdi Modarressi
{"title":"Smart Memory: Deep Learning Acceleration in 3D-Stacked Memories","authors":"Seyyed Hossein SeyyedAghaei Rezaei;Parham Zilouchian Moghaddam;Mehdi Modarressi","doi":"10.1109/LCA.2023.3287976","DOIUrl":"10.1109/LCA.2023.3287976","url":null,"abstract":"Processing-in-memory (PIM) is the most promising paradigm to address the bandwidth bottleneck in deep neural network (DNN) accelerators. However, the algorithmic and dataflow structure of DNNs still necessitates moving a large amount of data across banks inside the memory device to bring input data and their corresponding model parameters together, negatively shifting part of the bandwidth bottleneck to the in-memory data communication infrastructure. To alleviate this bottleneck, we present \u0000<italic>Smart Memory</i>\u0000, a highly parallel in-memory DNN accelerator for 3D memories that benefits from a scalable high-bandwidth in-memory network. Whereas the existing PIM designs implement the compute units and network-on-chip on the logic die of the underlying 3D memory, in \u0000<italic>Smart Memory</i>\u0000 the computation and data transmission tasks are distributed across the memory banks. To this end, each memory bank is equipped with (1) a very simple processing unit to run neural networks, and (2) a circuit-switched router to interconnect memory banks by a 3D network-on-memory. Our evaluation shows 44% average performance improvement over state-of-the-art in-memory DNN accelerators.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135784868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Mirage of Breaking MIRAGE: Analyzing the Modeling Pitfalls in Emerging “Attacks” on MIRAGE 打破海市蜃楼的海市蜃楼:分析海市蜃楼新出现的“攻击”中的建模缺陷
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-07-21 DOI: 10.1109/LCA.2023.3297875
Gururaj Saileshwar;Moinuddin Qureshi
{"title":"The Mirage of Breaking MIRAGE: Analyzing the Modeling Pitfalls in Emerging “Attacks” on MIRAGE","authors":"Gururaj Saileshwar;Moinuddin Qureshi","doi":"10.1109/LCA.2023.3297875","DOIUrl":"10.1109/LCA.2023.3297875","url":null,"abstract":"This letter studies common modeling pitfalls in security analyses of hardware defenses to highlight the importance of accurate reproduction of defenses. We provide a case study of MIRAGE (Saileshwar and Qureshi 2021), a defense against cache side channel attacks, and analyze its incorrect modeling in a recent work (Chakraborty et al., 2023) that claimed to break its security. We highlight several modeling pitfalls that can invalidate the security properties of any defense including a) incomplete modeling of components critical for security, b) usage of random number generators that are insufficiently random, and c) initialization of system to improbable states, leading to an incorrect conclusion of a vulnerability, and show how these modeling bugs incorrectly cause set conflicts to be observed in a recent work’s (Chakraborty et al., 2023) model of MIRAGE. We also provide an implementation addressing these bugs that does not incur set-conflicts, highlighting that MIRAGE is still unbroken.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48105760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the Latency Sensitivity of Cache Replacement Policies 缓存替换策略的时延敏感性研究
IF 2.3 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2023-07-19 DOI: 10.1109/LCA.2023.3296251
Ahmed Nematallah;Chang Hyun Park;David Black-Schaffer
{"title":"Exploring the Latency Sensitivity of Cache Replacement Policies","authors":"Ahmed Nematallah;Chang Hyun Park;David Black-Schaffer","doi":"10.1109/LCA.2023.3296251","DOIUrl":"10.1109/LCA.2023.3296251","url":null,"abstract":"With DRAM latencies increasing relative to CPU speeds, the performance of caches has become more important. This has led to increasingly sophisticated replacement policies that require complex calculations to update their replacement metadata, which often require multiple cycles. To minimize the negative impact of these metadata updates, architects have focused on policies that incur as little update latency as possible through a combination of reducing the policies’ precision and using parallel hardware. In this work we investigate whether these tradeoffs to reduce cache metadata update latency are needed. Specifically, we look at the performance and energy impact of increasing the latency of cache replacement policy updates. We find that even dramatic increases in replacement policy update latency have very limited effect. This indicates that designers have far more freedom to increase policy complexity and latency than previously assumed.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43674521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信