IEEE Computer Architecture Letters最新文献

筛选
英文 中文
PIMsynth: A Unified Compiler Framework for Bit-Serial Processing-in-Memory Architectures 内存中位串行处理体系结构的统一编译器框架
IF 1.4 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2025-08-19 DOI: 10.1109/LCA.2025.3600588
Deyuan Guo;Mohammadhosein Gholamrezaei;Matthew Hofmann;Ashish Venkat;Zhiru Zhang;Kevin Skadron
{"title":"PIMsynth: A Unified Compiler Framework for Bit-Serial Processing-in-Memory Architectures","authors":"Deyuan Guo;Mohammadhosein Gholamrezaei;Matthew Hofmann;Ashish Venkat;Zhiru Zhang;Kevin Skadron","doi":"10.1109/LCA.2025.3600588","DOIUrl":"https://doi.org/10.1109/LCA.2025.3600588","url":null,"abstract":"Bit-serial processing-in-memory (PIM) architectures have been extensively studied, yet a standardized tool for generating efficient bit-serial code is lacking, hindering fair comparisons. We present a fully automated compiler framework, PIMsynth, for bit-serial PIM architectures, targeting both digital and analog substrates. The compiler takes Verilog as input and generates optimized micro-operation code for programmable bit-serial PIM backends. Our flow integrates logic synthesis, optimization steps, instruction scheduling, and backend code generation into a unified toolchain. With the compiler, we provide a bit-serial compilation benchmark suite designed for efficient bit-serial code generation. To enable correctness and performance validation, we extend an existing PIM simulator to support compiler-generated micro-op-level workloads. Preliminary results demonstrate that the compiler generates competitive bit-serial code within <inline-formula><tex-math>$1.08times$</tex-math></inline-formula> and <inline-formula><tex-math>$1.54times$</tex-math></inline-formula> of hand-optimized digital and analog PIM baselines.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"24 2","pages":"277-280"},"PeriodicalIF":1.4,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144926886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AiDE: Attention-FFN Disaggregated Execution for Cost-Effective LLM Decoding on CXL-PNM 基于CXL-PNM的低成本LLM译码的注意- ffn分解执行
IF 1.4 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2025-08-11 DOI: 10.1109/LCA.2025.3597323
KyungSoo Kim;Omin Kwon;Yeonhong Park;Jae W. Lee
{"title":"AiDE: Attention-FFN Disaggregated Execution for Cost-Effective LLM Decoding on CXL-PNM","authors":"KyungSoo Kim;Omin Kwon;Yeonhong Park;Jae W. Lee","doi":"10.1109/LCA.2025.3597323","DOIUrl":"https://doi.org/10.1109/LCA.2025.3597323","url":null,"abstract":"Disaggregating the prefill and decode phases has recently emerged as a promising strategy in the large language model (LLM) serving systems, driven by the distinct resource demands of each phase. Inspired by this coarse-grained disaggregation, we identify a similar opportunity within the decode phase itself: the feedforward network (FFN) is compute-intensive, whereas attention is constrained by memory bandwidth and capacity due to its key-value (KV) cache. To exploit this heterogeneity, we introduce AiDE, a heterogeneous decoding cluster that executes FFN operations on GPUs while offloading attention computations to Compute Express Link-based Processing Near Memory (CXL-PNM) devices. CXL-PNM provides scalable memory bandwidth and capacity, making it well-suited for attention-heavy workloads. In addition, we propose a batch-level pipelining approach enhanced with request scheduling to optimize the utilization of heterogeneous resources. Our AiDE architecture achieves up to 3.87× higher throughput, 2.72× lower p90 time per output token (TPOT), and a 2.31× reduction in decode latency compared to a GPU-only baseline, at comparable cost, demonstrating significant potential of fine-grained disaggregation for cost-effective LLM inference.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"24 2","pages":"285-288"},"PeriodicalIF":1.4,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144998058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CABANA : Cluster-Aware Query Batching for Accelerating Billion-Scale ANNS With Intel AMX 用Intel AMX加速十亿规模ANNS的集群感知查询批处理
IF 1.4 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2025-08-08 DOI: 10.1109/LCA.2025.3596970
Minho Kim;Houxiang Ji;Jaeyoung Kang;Hwanjun Lee;Daehoon Kim;Nam Sung Kim
{"title":"CABANA : Cluster-Aware Query Batching for Accelerating Billion-Scale ANNS With Intel AMX","authors":"Minho Kim;Houxiang Ji;Jaeyoung Kang;Hwanjun Lee;Daehoon Kim;Nam Sung Kim","doi":"10.1109/LCA.2025.3596970","DOIUrl":"https://doi.org/10.1109/LCA.2025.3596970","url":null,"abstract":"Retrieval-augmented generation (RAG) systems increasingly rely on Approximate Nearest Neighbor Search (ANNS) to efficiently retrieve relevant context from billion-scale vector databases. While IVF-based ANNS frameworks scale well overall, the fine search stage remains a bottleneck due to its compute-intensive GEMV operations, particularly under large query volumes. To address this, we propose <monospace>CABANA</monospace>, a <u>c</u>luster-<u>a</u>ware query <u>b</u>atching for <u>AN</u>NS <u>a</u>cceleration mechanism using Intel Advanced Matrix Extensions (AMX) that reformulates these GEMV computations into high-throughput GEMM operations. By aggregating queries targeting the same clusters, <monospace>CABANA</monospace> enables batched computation during fine search, significantly improving compute intensity and memory access regularity. Evaluations on billion-scale datasets show that <monospace>CABANA</monospace> outperforms traditional SIMD-based implementations, achieving up to <inline-formula><tex-math>$32.6times$</tex-math></inline-formula> higher query throughput with minimal overhead, while maintaining high recall rates.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"24 2","pages":"289-292"},"PeriodicalIF":1.4,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11120372","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145027944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Checkflow: Low-Overhead Checkpointing for Deep Learning Training Checkflow:用于深度学习训练的低开销检查点
IF 1.4 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2025-08-07 DOI: 10.1109/LCA.2025.3596616
Hangyu Liu;Shouxi Luo;Ke Li;Huanlai Xing;Bo Peng
{"title":"Checkflow: Low-Overhead Checkpointing for Deep Learning Training","authors":"Hangyu Liu;Shouxi Luo;Ke Li;Huanlai Xing;Bo Peng","doi":"10.1109/LCA.2025.3596616","DOIUrl":"https://doi.org/10.1109/LCA.2025.3596616","url":null,"abstract":"During the time-consuming training of deep neural network (DNN) models, the worker has to periodically create checkpoints for tensors like the model parameters and optimizer state to support fast failover. However, due to the high overhead of checkpointing, existing schemes generally create checkpoints at a very low frequency, making recovery inefficient since the unsaved training progress would get lost. In this paper, we propose Checkflow, a low-overhead checkpointing scheme, which enables per-iteration checkpointing for DNN training with minimal or even zero cost of training slowdown. The power of Checkflow stems from the design of <inline-formula><tex-math>$i)$</tex-math></inline-formula> decoupling a tensor’s checkpoint operation into snapshot-then-offload, and <inline-formula><tex-math>$ii)$</tex-math></inline-formula> scheduling these operations appropriately, following the results of the math models. Our experimental results imply that, when the GPU-CPU connection has sufficient bandwidth for the training workload, Checkflow can theoretically overlap all the checkpoint operations for each round of training with the training computation, with trivial or no overhead in peak GPU memory occupancy.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"24 2","pages":"281-284"},"PeriodicalIF":1.4,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144926885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RAESC: A Reconfigurable AES Countermeasure Architecture for RISC-V With Enhanced Power Side-Channel Resilience RISC-V的可重构AES对抗体系结构与增强的功率侧信道弹性
IF 1.4 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2025-08-01 DOI: 10.1109/LCA.2025.3595003
Nayana Rajeev;Cathrene Biju;Titu Mary Ignatius;Roy Paily Palathinkal;Rekha K James
{"title":"RAESC: A Reconfigurable AES Countermeasure Architecture for RISC-V With Enhanced Power Side-Channel Resilience","authors":"Nayana Rajeev;Cathrene Biju;Titu Mary Ignatius;Roy Paily Palathinkal;Rekha K James","doi":"10.1109/LCA.2025.3595003","DOIUrl":"https://doi.org/10.1109/LCA.2025.3595003","url":null,"abstract":"This paper presents RAESC, a reconfigurable Advanced Encryption Standard (AES) countermeasure hardware design that supports AES-128, AES-192, and AES-256 types, enhancing flexibility and resource efficiency in IoT applications. The design incorporates a countermeasure to protect against Power-based Side Channel Attacks (PSCA) by randomizing the AES type based on input plaintext, ensuring improved security. The RAESC is integrated with an RV32IM RISC-V processor, offering streamlined operation and enhanced system security. Performance analysis shows that RAESC’s adaptive encryption strength achieves a balanced trade-off in area, power, and throughput, making it ideal for resource-constrained, security-sensitive IoT applications. Power traces for CPA attacks are generated on Application Specific Integrated Circuit (ASIC) and the design achieves a notable reduction in the Signal to Noise Ratio (SNR) and an increase in the Measurements to Disclose (MTD), demonstrating strong resilience against cryptographic attacks.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"24 2","pages":"273-276"},"PeriodicalIF":1.4,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144896825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RoSR: A Novel Selective Retransmission FPGA Architecture for RDMA NICs 一种新的RDMA网卡选择性重传FPGA架构
IF 1.4 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2025-07-31 DOI: 10.1109/LCA.2025.3594110
Mengting Zhang;Zhichuan Guo;Shining Sun
{"title":"RoSR: A Novel Selective Retransmission FPGA Architecture for RDMA NICs","authors":"Mengting Zhang;Zhichuan Guo;Shining Sun","doi":"10.1109/LCA.2025.3594110","DOIUrl":"https://doi.org/10.1109/LCA.2025.3594110","url":null,"abstract":"Remote Direct Memory Access (RDMA) enables low-latency datacenter networks but suffers from inefficient loss recovery using Go-Back-N (GBN). GBN retransmits entire packet windows, degrading Flow Completion Time (FCT) under congestion. We introduce RoSR, a novel selective retransmission architecture for Field-Programmable Gate Array (FPGA)-based RDMA NICs that supports hardware-accelerated direct writes of out-of-order (OoO) packets. RoSR supports efficient OoO packet reception and enables fine-grained retransmission using a dynamic shared bitmap for packet tracking. By extending the RDMA over Converged Ethernet version 2 (RoCEv2) packet format, RoSR facilitates selective retransmission. It triggers retransmissions via timeouts using bitmap blocks and introduces new Nack-bitmap and rd-req-bitmap messages for loss reporting. Under 1% packet loss, RoSR achieves up to 13.5× (RDMA Write) and 15.6× (RDMA Read) higher throughput than Xilinx ERNIC. In NS-3 simulations using the HPCC RDMA stack, RoSR reduces FCT slowdown by 3× to 6× compared to GBN across various packet loss rates, congestion control algorithms (DCQCN, HPCC, Timely), and traffic patterns, while maintaining robustness under high round-trip time (RTT) conditions.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"24 2","pages":"269-272"},"PeriodicalIF":1.4,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144896824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SSD Offloading for LLM Mixture-of-Experts Weights Considered Harmful in Energy Efficiency 考虑对能效有害的LLM混合专家权重的SSD卸载
IF 1.4 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2025-07-24 DOI: 10.1109/LCA.2025.3592563
Kwanhee Kyung;Sungmin Yun;Jung Ho Ahn
{"title":"SSD Offloading for LLM Mixture-of-Experts Weights Considered Harmful in Energy Efficiency","authors":"Kwanhee Kyung;Sungmin Yun;Jung Ho Ahn","doi":"10.1109/LCA.2025.3592563","DOIUrl":"https://doi.org/10.1109/LCA.2025.3592563","url":null,"abstract":"Large Language Models (LLMs) applying Mixture-of-Experts (MoE) scale to trillions of parameters but require vast memory, motivating a line of research to offload expert weights from fast-but-small DRAM (HBM) to denser Flash SSDs. While SSDs provide cost-effective capacity, their read energy per bit is substantially higher than that of DRAM. This paper quantitatively analyzes the energy implications of offloading MoE expert weights to SSDs during the critical decode stage of LLM inference. Our analysis, comparing SSD, CPU memory (DDR), and HBM storage scenarios for models like DeepSeek-R1, reveals that offloading MoE weights to current SSDs drastically increases per-token-generation energy consumption (e.g., by up to <inline-formula><tex-math>$sim 12times$</tex-math></inline-formula> compared to the HBM baseline), dominating the total inference energy budget. Although techniques like prefetching effectively hide access latency, they cannot mitigate this fundamental energy penalty. We further explore future technological scaling, finding that the inherent sparsity of MoE models could potentially make SSDs energy-viable <i>if</i> Flash read energy improves significantly, roughly by an order of magnitude.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"24 2","pages":"265-268"},"PeriodicalIF":1.4,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144880476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correct Wrong Path 纠正错误路径
IF 1.4 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2025-07-23 DOI: 10.1109/LCA.2025.3542809
Bhargav Reddy Godala;Sankara Prasad Ramesh;Krishnam Tibrewala;Chrysanthos Pepi;Gino Chacon;Svilen Kanev;Gilles A. Pokam;Alberto Ros;Daniel A. Jiménez;Paul V. Gratz;David I. August
{"title":"Correct Wrong Path","authors":"Bhargav Reddy Godala;Sankara Prasad Ramesh;Krishnam Tibrewala;Chrysanthos Pepi;Gino Chacon;Svilen Kanev;Gilles A. Pokam;Alberto Ros;Daniel A. Jiménez;Paul V. Gratz;David I. August","doi":"10.1109/LCA.2025.3542809","DOIUrl":"https://doi.org/10.1109/LCA.2025.3542809","url":null,"abstract":"Modern OOO CPUs have very deep pipelines with large branch misprediction recovery penalties. Speculatively executed instructions on the wrong path can significantly change cache state, depending on speculation levels. Architects often employ trace-driven simulation models in the design exploration stage, which sacrifice precision for speed. Trace-driven simulators are orders of magnitude faster than execution-driven models, reducing the often hundreds of thousands of simulation hours needed to explore new micro-architectural ideas. Despite the strong benefits of trace-driven simulation, it often fails to adequately model the consequences of wrong-path execution because obtaining such traces from real systems is nontrivial. Prior works exclusively consider either pollution or prefetching in the instruction stream/L1-I cache and often ignore the impact on the data stream. Here, we examine wrong path execution in simulation results and design a set of infrastructure for enabling wrong-path execution in a trace driven simulator. Our analysis shows the wrong path affects structures on both the instruction and data sides extensively, resulting in performance variations ranging from <inline-formula><tex-math>$-3.05$</tex-math></inline-formula>% to 20.9% versus ignoring wrong path. To benefit the research community and enhance the accuracy of simulators, we opened our traces and tracing utility in the hopes that industry can provide wrong-path traces generated by their internal simulators, enabling academic simulation without exposing industry IP.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"24 2","pages":"221-224"},"PeriodicalIF":1.4,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144687792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Per-Row Activation Counting on Real Hardware: Demystifying Performance Overheads 真实硬件上的每行激活计数:揭开性能开销的神秘面纱
IF 1.4 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2025-07-10 DOI: 10.1109/LCA.2025.3587293
Jumin Kim;Seungmin Baek;Minbok Wi;Hwayong Nam;Michael Jaemin Kim;Sukhan Lee;Kyomin Sohn;Jung Ho Ahn
{"title":"Per-Row Activation Counting on Real Hardware: Demystifying Performance Overheads","authors":"Jumin Kim;Seungmin Baek;Minbok Wi;Hwayong Nam;Michael Jaemin Kim;Sukhan Lee;Kyomin Sohn;Jung Ho Ahn","doi":"10.1109/LCA.2025.3587293","DOIUrl":"https://doi.org/10.1109/LCA.2025.3587293","url":null,"abstract":"Per-Row Activation Counting (PRAC), a DRAM read disturbance mitigation method, modifies key DRAM timing parameters, reportedly causing significant performance overheads in simulator-based studies. However, given known discrepancies between simulators and real hardware, real-machine experiments are vital for accurate PRAC performance estimation. We present the first real-machine performance analysis of PRAC. After verifying timing modifications on the latest CPUs using microbenchmarks, our analysis shows that PRAC’s average and maximum overheads are just 1.06% and 3.28% for the SPEC CPU2017 workloads—up to 9.15× lower than simulator-based reports. Further, we show that the close page policy minimizes this overhead by effectively hiding the elongated DRAM row precharge operations due to PRAC from the critical path.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"24 2","pages":"217-220"},"PeriodicalIF":1.4,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144687689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Old is Gold: Optimizing Single-Threaded Applications With ExGen-Malloc 老即是金:用ExGen-Malloc优化单线程应用程序
IF 1.4 3区 计算机科学
IEEE Computer Architecture Letters Pub Date : 2025-07-10 DOI: 10.1109/LCA.2025.3587582
Ruihao Li;Lizy K. John;Neeraja J. Yadwadkar
{"title":"Old is Gold: Optimizing Single-Threaded Applications With ExGen-Malloc","authors":"Ruihao Li;Lizy K. John;Neeraja J. Yadwadkar","doi":"10.1109/LCA.2025.3587582","DOIUrl":"https://doi.org/10.1109/LCA.2025.3587582","url":null,"abstract":"Memory allocators, though constituting a small portion of the entire program code, can significantly impact application performance by affecting global factors such as cache behaviors. Moreover, memory allocators are often regarded as a “datacenter tax” inherent to all programs. Even a 1% improvement in performance can lead to significant cost and energy savings when scaled across an entire datacenter fleet. Modern memory allocators are designed to optimize allocation speed and memory fragmentation in multi-threaded environments, relying on complex metadata and control logic to achieve high performance. However, the overhead introduced by this complexity prompts a reevaluation of allocator design. Notably, such overhead can be avoided in single-threaded scenarios, which continue to be widely used across diverse application domains. In this paper, we present <i>ExGen-Malloc</i>, a memory allocator specifically optimized for single-threaded applications. We prototyped <i>ExGen-Malloc</i> on a real system and demonstrated that it achieves a geometric mean speedup of <inline-formula><tex-math>$1.19 times$</tex-math></inline-formula> over dlmalloc and <inline-formula><tex-math>$1.03 times$</tex-math></inline-formula> over mimalloc, a modern multi-threaded allocator developed by Microsoft, on the SPEC CPU2017 benchmark suite.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"24 2","pages":"225-228"},"PeriodicalIF":1.4,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144687664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信