BoostTM: Best-effort performance guarantees in best-effort hardware transactional memory for distributed manycore architectures

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture Pub Date : 2025-07-03 DOI:10.1016/j.sysarc.2025.103481

Li Wan, Zhiyuan Zhang, Chao Fu, Qiang Li, Jun Han

{"title":"BoostTM: Best-effort performance guarantees in best-effort hardware transactional memory for distributed manycore architectures","authors":"Li Wan, Zhiyuan Zhang, Chao Fu, Qiang Li, Jun Han","doi":"10.1016/j.sysarc.2025.103481","DOIUrl":null,"url":null,"abstract":"<div><div>Concurrent access to shared data in multithreaded programming remains a performance bottleneck in Chip-Multiprocessor (CMP) systems. Best-effort Hardware Transactional Memory (HTM) offers a potential solution but faces critical constraints: frequent livelocks due to the requester-wins conflict strategy, inability to coexist with non-speculative fallback paths, and vulnerability to non-conflict-induced abort events, such as cache overflows and core exceptions. Mainstream CMP platforms, which typically feature out-of-order cores and distributed last-level caches (LLCs), introduce additional challenges for HTM optimization. This paper first formalizes these constraints and provides a theoretical performance analysis of our previous work, LockillerTM, highlighting its inherent advantages. We then introduce BoostTM, an enhanced version of LockillerTM designed for mainstream CMP systems. BoostTM incorporates design improvements to address the identified challenges and introduces a core exception handling mechanism to fill the gap left by LockillerTM in alleviating non-conflict-induced abort events. Finally, we extend the gem5 infrastructure to validate and evaluate BoostTM on a newly configured experimental platform with 32 out-of-order cores and distributed LLCs. Our evaluation demonstrates that BoostTM outperforms best-effort HTM, LockillerTM, and recent works—LosaTM-SAFU and CIT—with minimal overhead, providing a comprehensive understanding of the effectiveness and adaptability of each mechanism.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"167 ","pages":"Article 103481"},"PeriodicalIF":3.7000,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762125001535","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Concurrent access to shared data in multithreaded programming remains a performance bottleneck in Chip-Multiprocessor (CMP) systems. Best-effort Hardware Transactional Memory (HTM) offers a potential solution but faces critical constraints: frequent livelocks due to the requester-wins conflict strategy, inability to coexist with non-speculative fallback paths, and vulnerability to non-conflict-induced abort events, such as cache overflows and core exceptions. Mainstream CMP platforms, which typically feature out-of-order cores and distributed last-level caches (LLCs), introduce additional challenges for HTM optimization. This paper first formalizes these constraints and provides a theoretical performance analysis of our previous work, LockillerTM, highlighting its inherent advantages. We then introduce BoostTM, an enhanced version of LockillerTM designed for mainstream CMP systems. BoostTM incorporates design improvements to address the identified challenges and introduces a core exception handling mechanism to fill the gap left by LockillerTM in alleviating non-conflict-induced abort events. Finally, we extend the gem5 infrastructure to validate and evaluate BoostTM on a newly configured experimental platform with 32 out-of-order cores and distributed LLCs. Our evaluation demonstrates that BoostTM outperforms best-effort HTM, LockillerTM, and recent works—LosaTM-SAFU and CIT—with minimal overhead, providing a comprehensive understanding of the effectiveness and adaptability of each mechanism.

查看原文本刊更多论文

BoostTM：在分布式多核体系结构的硬件事务内存中保证最佳性能

多线程编程中对共享数据的并发访问仍然是芯片多处理器（CMP）系统的性能瓶颈。尽力而为硬件事务性内存（Best-effort Hardware Transactional Memory， HTM）提供了一种潜在的解决方案，但它面临着一些关键的限制：由于请求者获胜的冲突策略导致频繁的活动锁，无法与非推测性的回退路径共存，以及容易受到非冲突引发的中止事件的影响，比如缓存溢出和核心异常。主流CMP平台通常具有乱序内核和分布式最后一级缓存（llc），这给HTM优化带来了额外的挑战。本文首先形式化了这些约束，并对我们之前的工作LockillerTM进行了理论性能分析，突出了其固有的优势。然后介绍BoostTM，这是LockillerTM的增强版本，专为主流CMP系统设计。BoostTM整合了设计改进，以解决已确定的挑战，并引入了核心异常处理机制，以填补LockillerTM在缓解非冲突引起的中断事件方面留下的空白。最后，我们扩展了gem5基础架构，在一个新配置的实验平台上验证和评估BoostTM，该平台具有32个乱序内核和分布式llc。我们的评估表明，BoostTM以最小的开销胜过了尽力而为的HTM、LockillerTM和最近的工作——losatm - safu和cit——提供了对每种机制的有效性和适应性的全面理解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Systems Architecture 工程技术-计算机：硬件

CiteScore

8.70

自引率

15.60%

发文量

226

审稿时长

46 days

期刊介绍： The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software. Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.