IEEE Computer Architecture Letters最新文献

筛选
英文 中文
PINSim: A Processing In- and Near-Sensor Simulator to Model Intelligent Vision Sensors PINSim:用于模拟智能视觉传感器的处理内传感器和近传感器模拟器
IF 1.4 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2024-12-25 DOI: 10.1109/LCA.2024.3522777
Sepehr Tabrizchi;Mehrdad Morsali;David Pan;Shaahin Angizi;Arman Roohi
{"title":"PINSim: A Processing In- and Near-Sensor Simulator to Model Intelligent Vision Sensors","authors":"Sepehr Tabrizchi;Mehrdad Morsali;David Pan;Shaahin Angizi;Arman Roohi","doi":"10.1109/LCA.2024.3522777","DOIUrl":"https://doi.org/10.1109/LCA.2024.3522777","url":null,"abstract":"This letter introduces PINSim, a user-friendly and flexible framework for simulating emerging smart vision sensors in the early design stages. PINSim enables the realization of integrated sensing and processing near and in the sensor, effectively addressing challenges such as data movement and power-hungry analog-to-digital converters. The framework offers a flexible interface and a wide range of design options for customizing the efficiency and accuracy of processing-near/in-sensor-based accelerators using a hierarchical structure. Its organization spans from the device level upward to the algorithm level. PINSim realizes instruction-accurate evaluation of circuit-level performance metrics. PINSim achieves over <inline-formula><tex-math>$25,000times$</tex-math></inline-formula> speed-up compared to SPICE simulation with less than a 4.1% error rate on average. Furthermore, it supports both multilayer perceptron (MLP) and convolutional neural network (CNN) models, with limitations determined by IoT budget constraints. By facilitating the exploration and optimization of various design parameters, PiNSim empowers researchers and engineers to develop energy-efficient and high-performance smart vision sensors for a wide range of applications.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"24 1","pages":"17-20"},"PeriodicalIF":1.4,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ZoneBuffer: An Efficient Buffer Management Scheme for ZNS SSDs ZoneBuffer:适用于 ZNS 固态硬盘的高效缓冲区管理方案
IF 1.4 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2024-12-16 DOI: 10.1109/LCA.2024.3498103
Hongtao Wang;Peiquan Jin
{"title":"ZoneBuffer: An Efficient Buffer Management Scheme for ZNS SSDs","authors":"Hongtao Wang;Peiquan Jin","doi":"10.1109/LCA.2024.3498103","DOIUrl":"https://doi.org/10.1109/LCA.2024.3498103","url":null,"abstract":"The introduction of Zoned Namespace SSDs (ZNS SSDs) presents new challenges for existing buffer management schemes. In addition to traditional SSD characteristics such as read/write asymmetry and limited write endurance, ZNS SSDs possess unique constraints, such as requiring sequential writes within each zone. These features make conventional buffering policies incompatible with ZNS SSDs. This paper introduces ZoneBuffer, a novel buffering scheme designed specifically for ZNS SSDs. ZoneBuffer's innovation lies in two key aspects. First, it introduces a new buffer structure comprising a Work Region and a Priority Region. The Priority Region is further divided into a clean page queue and a zone cluster of dirty pages. By confining buffer replacement to the Priority Region, ZoneBuffer ensures optimization for ZNS SSDs. Second, ZoneBuffer incorporates a lifetime-based clustering algorithm to group dirty pages within the Priority Region, optimizing write operations. Preliminary experiments conducted on a real ZNS SSD demonstrate the effectiveness of ZoneBuffer. Compared with conventional schemes like LRU and CFLRU, the results indicate that ZoneBuffer significantly improves performance.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"23 2","pages":"239-242"},"PeriodicalIF":1.4,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142825861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Straw: A Stress-Aware WL-Based Read Reclaim Technique for High-Density NAND Flash-Based SSDs Straw:一种应力感知的基于wl的高密度NAND闪存固态硬盘读回收技术
IF 1.4 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2024-12-12 DOI: 10.1109/LCA.2024.3516205
Myoungjun Chun;Jaeyong Lee;Inhyuk Choi;Jisung Park;Myungsuk Kim;Jihong Kim
{"title":"Straw: A Stress-Aware WL-Based Read Reclaim Technique for High-Density NAND Flash-Based SSDs","authors":"Myoungjun Chun;Jaeyong Lee;Inhyuk Choi;Jisung Park;Myungsuk Kim;Jihong Kim","doi":"10.1109/LCA.2024.3516205","DOIUrl":"https://doi.org/10.1109/LCA.2024.3516205","url":null,"abstract":"Although read disturbance has emerged as a major reliability concern, managing read disturbance in modern NAND flash memory has not been thoroughly investigated yet. From a device characterization study using real modern NAND flash memory, we observe that reading a page incurs heterogeneous reliability impacts on each WL, which makes the existing block-level read reclaim extremely inefficient. We propose a new WL-level read-reclaim technique, called \u0000<sc>Straw</small>\u0000, which keeps track of the accumulated read-disturbance effect on each WL and reclaims only heavily-disturbed WLs. By avoiding unnecessary read-reclaim operations, \u0000<sc>Straw</small>\u0000 reduces read-reclaim-induced page writes by 83.6% with negligible storage overhead.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"24 1","pages":"5-8"},"PeriodicalIF":1.4,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142890390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Electra: Eliminating the Ineffectual Computations on Bitmap Compressed Matrices 消除位图压缩矩阵的无效计算
IF 1.4 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2024-12-12 DOI: 10.1109/LCA.2024.3516057
Chaithanya Krishna Vadlamudi;Bahar Asgari
{"title":"Electra: Eliminating the Ineffectual Computations on Bitmap Compressed Matrices","authors":"Chaithanya Krishna Vadlamudi;Bahar Asgari","doi":"10.1109/LCA.2024.3516057","DOIUrl":"https://doi.org/10.1109/LCA.2024.3516057","url":null,"abstract":"The primary computations in several applications, such as deep learning recommendation models, graph neural networks, and scientific computing, involve sparse matrix sparse matrix multiplications (SpMSpM). Unlike standard multiplications, SpMSpMs introduce ineffective computations that can negatively impact performance. While several accelerators have been proposed to execute SpMSpM more efficiently, they often incur additional overhead in identifying the effectual arithmetic computations. To solve this issue, we propose Electra, a novel approach designed to reduce ineffectual computations in bitmap-compressed matrices. Electra achieves this by i) performing logical operations on the bitmap data to know whether the arithmetic computation has a zero or non-zero value, and ii) implementing finer granular scheduling of non-zero elements to arithmetic units. Our evaluations suggest that on average, Electra achieves a speedup of 1.27× over the state-of-the-art SpMSpM accelerator with a small area overhead of 64.92 \u0000<inline-formula><tex-math>$text{mm}^{2}$</tex-math></inline-formula>\u0000 based on 45 nm process.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"24 1","pages":"9-12"},"PeriodicalIF":1.4,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142890128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IntervalSim++: Enhanced Interval Simulation for Unbalanced Processor Designs IntervalSim++:非平衡处理器设计的增强间隔仿真
IF 1.4 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2024-12-09 DOI: 10.1109/LCA.2024.3514917
Haseung Bong;Nahyeon Kang;Youngsok Kim;Joonsung Kim;Hanhwi Jang
{"title":"IntervalSim++: Enhanced Interval Simulation for Unbalanced Processor Designs","authors":"Haseung Bong;Nahyeon Kang;Youngsok Kim;Joonsung Kim;Hanhwi Jang","doi":"10.1109/LCA.2024.3514917","DOIUrl":"https://doi.org/10.1109/LCA.2024.3514917","url":null,"abstract":"As processor microarchitecture is getting complicated, an accurate analytic model becomes crucial for exploring large processor design space within limited development time. An interval simulation is a widely used analytic model for processor designs in the early stage. However, it cannot accurately model modern microarchitecture, which has an \u0000<italic>unbalanced</i>\u0000 pipeline. In this work, we introduce IntervalSim++, an accurate analytic model for a modern microarchitecture design based on the interval simulation. We identify key components highly related to the unbalanced pipeline and propose new modeling techniques atop the interval simulation without incurring significant overheads. Our evaluations show IntervalSim++ accurately models a modern out-of-order processor with minimal overheads, showing 1% average CPI error and only 8.8% simulation time increase compared to the baseline interval simulation.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"24 1","pages":"1-4"},"PeriodicalIF":1.4,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142890358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SCALES: SCALable and Area-Efficient Systolic Accelerator for Ternary Polynomial Multiplication 缩放:可伸缩和面积有效的收缩加速器为三元多项式乘法
IF 1.4 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2024-11-25 DOI: 10.1109/LCA.2024.3505872
Samuel Coulon;Tianyou Bao;Jiafeng Xie
{"title":"SCALES: SCALable and Area-Efficient Systolic Accelerator for Ternary Polynomial Multiplication","authors":"Samuel Coulon;Tianyou Bao;Jiafeng Xie","doi":"10.1109/LCA.2024.3505872","DOIUrl":"https://doi.org/10.1109/LCA.2024.3505872","url":null,"abstract":"Polynomial multiplication is a key component in many post-quantum cryptography and homomorphic encryption schemes. One recurring variation, ternary polynomial multiplication over ring \u0000<inline-formula><tex-math>$mathbb {Z}_{q}/(x^{n}+1)$</tex-math></inline-formula>\u0000 where one input polynomial has ternary coefficients {−1,0,1} and the other has large integer coefficients {0, \u0000<inline-formula><tex-math>$q-1$</tex-math></inline-formula>\u0000}, has recently drawn significant attention from various communities. Following this trend, this paper presents a novel \u0000<b>SCAL</b>\u0000able and area-\u0000<b>E</b>\u0000fficient \u0000<b>S</b>\u0000ystolic (SCALES) accelerator for ternary polynomial multiplication. In total, we have carried out three layers of coherent interdependent efforts. First, we have rigorously derived a novel block-processing strategy and algorithm based on the schoolbook method for polynomial multiplication. Then, we have innovatively implemented the proposed algorithm as the SCALES accelerator with the help of a number of field-programmable gate array (FPGA)-oriented optimization techniques. Lastly, we have conducted a thorough implementation analysis to showcase the efficiency of the proposed accelerator. The comparison demonstrated that the SCALES accelerator has at least 19.0% and 23.8% less equivalent area-time product (eATP) than the state-of-the-art designs. We hope this work can stimulate continued research in the field.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"23 2","pages":"243-246"},"PeriodicalIF":1.4,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142825860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Case for Hardware Memoization in Server CPUs 服务器cpu硬件记忆的案例
IF 1.4 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2024-11-22 DOI: 10.1109/LCA.2024.3505075
Farid Samandi;Natheesan Ratnasegar;Michael Ferdman
{"title":"A Case for Hardware Memoization in Server CPUs","authors":"Farid Samandi;Natheesan Ratnasegar;Michael Ferdman","doi":"10.1109/LCA.2024.3505075","DOIUrl":"https://doi.org/10.1109/LCA.2024.3505075","url":null,"abstract":"Server applications exhibit a high degree of code repetition because they handle many similar requests. In turn, repeated execution of the same code, often with identical inputs, highlights an inefficiency in the execution of server software and suggests memoization as a way to improve performance. Memoization has been extensively explored in software, and several hardware- and hardware-assisted memoization schemes have been proposed in the literature. However, these works targeted memoization of mathematical or algorithmic processing, whereas server applications call for a different approach. We observe that the opportunity for memoization in servers arises not from eliminating the repetition of complex computation, but from eliminating the repetition of software orchestration code. This work studies hardware memoization in servers, ultimately focusing on one pattern, instruction sequences starting with indirect jumps. We explore how an out-of-order pipeline can be extended to support memoization of these instruction sequences, demonstrating the potential of hardware memoization for servers. Using 26 applications to make our case (3 CloudSuite workloads and 23 vSwarm serverless functions), we show how targeting just this one pattern of instruction sequences can memoize over 10% (up to 15.6%) of the dynamically executed instructions in these server applications.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"23 2","pages":"231-234"},"PeriodicalIF":1.4,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142761396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Characterization and Analysis of the 3D Gaussian Splatting Rendering Pipeline 三维高斯飞溅渲染管道的表征与分析
IF 1.4 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2024-11-21 DOI: 10.1109/LCA.2024.3504579
Jiwon Lee;Yunjae Lee;Youngeun Kwon;Minsoo Rhu
{"title":"Characterization and Analysis of the 3D Gaussian Splatting Rendering Pipeline","authors":"Jiwon Lee;Yunjae Lee;Youngeun Kwon;Minsoo Rhu","doi":"10.1109/LCA.2024.3504579","DOIUrl":"https://doi.org/10.1109/LCA.2024.3504579","url":null,"abstract":"Novel view synthesis, a task generating a 2D image frame from a specific viewpoint within a 3D object or scene, plays a crucial role in 3D rendering. Neural Radiance Field (NeRF) emerged as a prominent method for implementing novel view synthesis, but 3D Gaussian Splatting (3DGS) recently began to emerge as a viable alternative. Despite the tremendous interest from both academia and industry, there has been a lack of research to identify the computational bottlenecks of 3DGS, which is critical for its deployment in real-world products. In this work, we present a comprehensive end-to-end characterization of the 3DGS rendering pipeline, identifying the alpha blending stage within the tile-based rasterizer as causing a significant performance bottleneck. Based on our findings, we discuss several future research directions aiming to inspire continued exploration within this burgeoning application domain.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"24 1","pages":"13-16"},"PeriodicalIF":1.4,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SPGPU: Spatially Programmed GPU SPGPU:空间编程 GPU
IF 1.4 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2024-11-14 DOI: 10.1109/LCA.2024.3499339
Shizhuo Zhu;Illia Shkirko;Jacob Levinson;Zhengrong Wang;Tony Nowatzki
{"title":"SPGPU: Spatially Programmed GPU","authors":"Shizhuo Zhu;Illia Shkirko;Jacob Levinson;Zhengrong Wang;Tony Nowatzki","doi":"10.1109/LCA.2024.3499339","DOIUrl":"https://doi.org/10.1109/LCA.2024.3499339","url":null,"abstract":"Communication is a critical bottleneck for GPUs, manifesting as energy and performance overheads due to network-on-chip (NoC) delay and congestion. While many algorithms exhibit locality among thread blocks and accessed data, modern GPUs lack the interface to exploit this locality: GPU thread blocks are mapped to cores obliviously. In this work, we explore a simple extension to the conventional GPU programming interface to enable control over the spatial placement of data and threads, yielding new opportunities for aggressive locality optimizations within a GPU kernel. Across 7 workloads that can take advantage of these optimizations, for a 32 (or 128) SM GPU: we achieve a 1.28× (1.54×) speedup and 35% (44%) reduction in NoC traffic, compared to baseline non-spatial GPUs.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"23 2","pages":"223-226"},"PeriodicalIF":1.4,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142736403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantum Assertion Scheme for Assuring Qudit Robustness 确保 Qudit 稳健性的量子断言方案
IF 1.4 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2024-11-04 DOI: 10.1109/LCA.2024.3483840
Navnil Choudhury;Chao Lu;Kanad Basu
{"title":"Quantum Assertion Scheme for Assuring Qudit Robustness","authors":"Navnil Choudhury;Chao Lu;Kanad Basu","doi":"10.1109/LCA.2024.3483840","DOIUrl":"https://doi.org/10.1109/LCA.2024.3483840","url":null,"abstract":"Noisy Intermediate-Scale Quantum (NISQ) computers are impeded by constraints such as limited qubit count and susceptibility to noise, hindering the progression towards fault-tolerant quantum computing for intricate and practical applications. To augment the computational capabilities of quantum computers, research is gravitating towards qudits featuring more than two energy levels. This paper presents the inaugural examination of the repercussions of errors in qudit circuits. Subsequently, we introduce an innovative qudit-based assertion framework aimed at automatically detecting and reporting errors and warnings during the quantum circuit design and compilation process. Our proposed framework, when subjected to evaluation on existing quantum computing platforms, can detect both new and existing bugs with up to 100% coverage of the bugs mentioned in this paper.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"23 2","pages":"247-250"},"PeriodicalIF":1.4,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142825859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信