{"title":"BoostTM: Best-effort performance guarantees in best-effort hardware transactional memory for distributed manycore architectures","authors":"Li Wan, Zhiyuan Zhang, Chao Fu, Qiang Li, Jun Han","doi":"10.1016/j.sysarc.2025.103481","DOIUrl":null,"url":null,"abstract":"<div><div>Concurrent access to shared data in multithreaded programming remains a performance bottleneck in Chip-Multiprocessor (CMP) systems. Best-effort Hardware Transactional Memory (HTM) offers a potential solution but faces critical constraints: frequent livelocks due to the requester-wins conflict strategy, inability to coexist with non-speculative fallback paths, and vulnerability to non-conflict-induced abort events, such as cache overflows and core exceptions. Mainstream CMP platforms, which typically feature out-of-order cores and distributed last-level caches (LLCs), introduce additional challenges for HTM optimization. This paper first formalizes these constraints and provides a theoretical performance analysis of our previous work, LockillerTM, highlighting its inherent advantages. We then introduce BoostTM, an enhanced version of LockillerTM designed for mainstream CMP systems. BoostTM incorporates design improvements to address the identified challenges and introduces a core exception handling mechanism to fill the gap left by LockillerTM in alleviating non-conflict-induced abort events. Finally, we extend the gem5 infrastructure to validate and evaluate BoostTM on a newly configured experimental platform with 32 out-of-order cores and distributed LLCs. Our evaluation demonstrates that BoostTM outperforms best-effort HTM, LockillerTM, and recent works—LosaTM-SAFU and CIT—with minimal overhead, providing a comprehensive understanding of the effectiveness and adaptability of each mechanism.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"167 ","pages":"Article 103481"},"PeriodicalIF":3.7000,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762125001535","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Concurrent access to shared data in multithreaded programming remains a performance bottleneck in Chip-Multiprocessor (CMP) systems. Best-effort Hardware Transactional Memory (HTM) offers a potential solution but faces critical constraints: frequent livelocks due to the requester-wins conflict strategy, inability to coexist with non-speculative fallback paths, and vulnerability to non-conflict-induced abort events, such as cache overflows and core exceptions. Mainstream CMP platforms, which typically feature out-of-order cores and distributed last-level caches (LLCs), introduce additional challenges for HTM optimization. This paper first formalizes these constraints and provides a theoretical performance analysis of our previous work, LockillerTM, highlighting its inherent advantages. We then introduce BoostTM, an enhanced version of LockillerTM designed for mainstream CMP systems. BoostTM incorporates design improvements to address the identified challenges and introduces a core exception handling mechanism to fill the gap left by LockillerTM in alleviating non-conflict-induced abort events. Finally, we extend the gem5 infrastructure to validate and evaluate BoostTM on a newly configured experimental platform with 32 out-of-order cores and distributed LLCs. Our evaluation demonstrates that BoostTM outperforms best-effort HTM, LockillerTM, and recent works—LosaTM-SAFU and CIT—with minimal overhead, providing a comprehensive understanding of the effectiveness and adaptability of each mechanism.
期刊介绍:
The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software.
Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.