IEEE Computer Architecture Letters最新文献_第4页

Estimating CPI Stacks From Multiplexed Performance Counter Data Using Machine Learning 使用机器学习从多路性能计数器数据估计CPI堆栈

IF 1.4 3区计算机科学

IEEE Computer Architecture Letters Pub Date : 2025-04-01 DOI: 10.1109/LCA.2025.3556644

Daniel Puckett;Tyler Tomer;Paul V. Gratz;Jiang Hu;Galen Shipman;Jered Dominguez-Trujillo;Kevin Sheridan

引用次数: 0

Accelerating Control Flow on CGRAs via Speculative Iteration Execution 通过推测迭代执行加速CGRAs控制流

IF 1.4 3区计算机科学

IEEE Computer Architecture Letters Pub Date : 2025-03-26 DOI: 10.1109/LCA.2025.3554777

Heng Cao;Zhipeng Wu;Dejian Li;Peiguang Jing;Sio Hang Pun;Yu Liu

引用次数: 0

Approximate SFQ-Based Computing Architecture Modeling With Device-Level Guidelines 基于sfq的近似计算体系结构建模与设备级指南

IF 1.4 3区计算机科学

IEEE Computer Architecture Letters Pub Date : 2025-03-26 DOI: 10.1109/LCA.2025.3573740

Pratiksha Mundhe;Yuta Hano;Satoshi Kawakami;Teruo Tanimoto;Masamitsu Tanaka;Koji Inoue;Ilkwon Byun

引用次数: 0

Exploiting Intel AMX Power Gating 利用英特尔AMX电源门控

IF 1.4 3区计算机科学

IEEE Computer Architecture Letters Pub Date : 2025-03-26 DOI: 10.1109/LCA.2025.3555183

Joshua Kalyanapu;Farshad Dizani;Azam Ghanbari;Darsh Asher;Samira Mirbagher Ajorpaz

引用次数: 0

X-PPR: Post Package Repair for CXL Memory X-PPR: CXL内存的包后修复

IF 1.4 3区计算机科学

IEEE Computer Architecture Letters Pub Date : 2025-03-21 DOI: 10.1109/LCA.2025.3552190

Chihun Song;Michael Jaemin Kim;Yan Sun;Houxiang Ji;Kyungsan Kim;TaeKyeong Ko;Jung Ho Ahn;Nam Sung Kim

{"title":"X-PPR: Post Package Repair for CXL Memory","authors":"Chihun Song;Michael Jaemin Kim;Yan Sun;Houxiang Ji;Kyungsan Kim;TaeKyeong Ko;Jung Ho Ahn;Nam Sung Kim","doi":"10.1109/LCA.2025.3552190","DOIUrl":"https://doi.org/10.1109/LCA.2025.3552190","url":null,"abstract":"CXL is an emerging interface that can cost-efficiently expand the capacity and bandwidth of servers, recycling DRAM modules from retired servers. Such DRAM modules, however, will likely have many uncorrectable faulty words due to years of strenuous use in datacenters. To repair faulty words in the field, a few solutions based on Post Package Repair (PPR) and memory offlining have been proposed. Nonetheless, they are either unable to fix thousands of faulty words or prone to causing severe memory fragmentation, as they operate at the granularity of DRAM row and memory page addresses, respectively. In this work, for cost-efficient use of recycled DRAM modules with thousands of faulty words, we propose C<u>X</u>L-<u>PPR</u> (X-PPR), exploiting the CXL’s support for near-memory processing and variable memory access latency. We demonstrate that X-PPR implemented in a commercial CXL device with DDR4 DRAM modules can handle a faulty bit probability that is <inline-formula><tex-math>$3.3 times 10^{4}$</tex-math></inline-formula> higher than ECC for a 512GB DRAM module. Meanwhile, X-PPR negligibly degrades the performance of popular memory-intensive benchmarks, which is achieved through two mechanisms designed in X-PPR to minimize the performance impact of additional DRAM accesses required for repairing faulty words.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"24 1","pages":"97-100"},"PeriodicalIF":1.4,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143792857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

srNAND: A Novel NAND Flash Organization for Enhanced Small Read Throughput in SSDs srNAND：一种提高ssd小读吞吐量的新型NAND闪存组织

IF 1.4 3区计算机科学

IEEE Computer Architecture Letters Pub Date : 2025-03-19 DOI: 10.1109/LCA.2025.3571321

Jeongho Lee;Sangjun Kim;Jaeyong Lee;Jaeyoung Kang;Sungjin Lee;Nam Sung Kim;Jihong Kim

引用次数: 0

DynaFlow: An ML Framework for Dynamic Dataflow Selection in SpGEMM Accelerators 在SpGEMM加速器中动态数据流选择的ML框架

IF 1.4 3区计算机科学

IEEE Computer Architecture Letters Pub Date : 2025-03-15 DOI: 10.1109/LCA.2025.3570667

Sanjali Yadav;Bahar Asgari

引用次数: 0

Cosmos: A CXL-Based Full In-Memory System for Approximate Nearest Neighbor Search Cosmos：一个基于cxl的全内存系统，用于近似最近邻搜索

IF 1.4 3区计算机科学

IEEE Computer Architecture Letters Pub Date : 2025-03-14 DOI: 10.1109/LCA.2025.3570235

Seoyoung Ko;Hyunjeong Shim;Wanju Doh;Sungmin Yun;Jinin So;Yongsuk Kwon;Sang-Soo Park;Si-Dong Roh;Minyong Yoon;Taeksang Song;Jung Ho Ahn

引用次数: 0

Minimal Counters, Maximum Insight: Simplifying System Performance With HPC Clusters for Optimized Monitoring 最小的计数器，最大的洞察力：简化系统性能与高性能计算集群优化监控

IF 1.4 3区计算机科学

IEEE Computer Architecture Letters Pub Date : 2025-03-14 DOI: 10.1109/LCA.2025.3570157

Shubhi Shukla;Abhijeet Singh;Rajdeep Chakraborty;Anirban Chakraborty;Tejas Rathod;Harshal Mumbaikar;Manoj Kumar Munigala;Madhusudhan K N;Pabitra Mitra;Debdeep Mukhopadhyay

引用次数: 0

SDT: Cutting Datacenter Tax Through Simultaneous Data-Delivery Threads SDT：通过同步数据传递线程减少数据中心的税收

IF 1.4 3区计算机科学

IEEE Computer Architecture Letters Pub Date : 2025-03-11 DOI: 10.1109/LCA.2025.3549423

Amin Mamandipoor;Huy Dinh Tran;Mohammad Alian

{"title":"SDT: Cutting Datacenter Tax Through Simultaneous Data-Delivery Threads","authors":"Amin Mamandipoor;Huy Dinh Tran;Mohammad Alian","doi":"10.1109/LCA.2025.3549423","DOIUrl":"https://doi.org/10.1109/LCA.2025.3549423","url":null,"abstract":"Networking is considered a datacenter tax, and hyperscalers push hard to provide high-performance networking with minimal resource expenditure. To keep up with the ever-increasing network rates, many CPU cycles are spent on the networking tax. We make a key observation that network processing threads can be simultaneously executed on server CPUs with minimal interference with the application threads. However, utilizing simultaneous multithreading (SMT) to scale the number of network threads with the number of application threads suffers from (1) failing to provide strict tail latency requirements for latency-critical applications, and (2) reducing the number of available hardware threads for application processes, thus contributing to a high datacenter network tax. In this work, we design, implement, and evaluate a chip-multiprocessor (CMP) with specialized Simultaneous Data-delivery Threads (SDT) per physical core. The key insight is that with judicious partitioning at the architectural level, SDT can safely co-run with application processes with guaranteed performance isolation. Our evaluation results, using full-system simulation, show that a 20-core CMP enhanced with SDT reduces the area and power consumption of a baseline 40-core CMP by 47.5% and 66%, respectively, while reducing network throughput by less than 10%.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"24 1","pages":"93-96"},"PeriodicalIF":1.4,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143777969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0