BoostTM: Best-effort performance guarantees in best-effort hardware transactional memory for distributed manycore architectures

IF 3.7 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Li Wan, Zhiyuan Zhang, Chao Fu, Qiang Li, Jun Han
{"title":"BoostTM: Best-effort performance guarantees in best-effort hardware transactional memory for distributed manycore architectures","authors":"Li Wan,&nbsp;Zhiyuan Zhang,&nbsp;Chao Fu,&nbsp;Qiang Li,&nbsp;Jun Han","doi":"10.1016/j.sysarc.2025.103481","DOIUrl":null,"url":null,"abstract":"<div><div>Concurrent access to shared data in multithreaded programming remains a performance bottleneck in Chip-Multiprocessor (CMP) systems. Best-effort Hardware Transactional Memory (HTM) offers a potential solution but faces critical constraints: frequent livelocks due to the requester-wins conflict strategy, inability to coexist with non-speculative fallback paths, and vulnerability to non-conflict-induced abort events, such as cache overflows and core exceptions. Mainstream CMP platforms, which typically feature out-of-order cores and distributed last-level caches (LLCs), introduce additional challenges for HTM optimization. This paper first formalizes these constraints and provides a theoretical performance analysis of our previous work, LockillerTM, highlighting its inherent advantages. We then introduce BoostTM, an enhanced version of LockillerTM designed for mainstream CMP systems. BoostTM incorporates design improvements to address the identified challenges and introduces a core exception handling mechanism to fill the gap left by LockillerTM in alleviating non-conflict-induced abort events. Finally, we extend the gem5 infrastructure to validate and evaluate BoostTM on a newly configured experimental platform with 32 out-of-order cores and distributed LLCs. Our evaluation demonstrates that BoostTM outperforms best-effort HTM, LockillerTM, and recent works—LosaTM-SAFU and CIT—with minimal overhead, providing a comprehensive understanding of the effectiveness and adaptability of each mechanism.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"167 ","pages":"Article 103481"},"PeriodicalIF":3.7000,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762125001535","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Concurrent access to shared data in multithreaded programming remains a performance bottleneck in Chip-Multiprocessor (CMP) systems. Best-effort Hardware Transactional Memory (HTM) offers a potential solution but faces critical constraints: frequent livelocks due to the requester-wins conflict strategy, inability to coexist with non-speculative fallback paths, and vulnerability to non-conflict-induced abort events, such as cache overflows and core exceptions. Mainstream CMP platforms, which typically feature out-of-order cores and distributed last-level caches (LLCs), introduce additional challenges for HTM optimization. This paper first formalizes these constraints and provides a theoretical performance analysis of our previous work, LockillerTM, highlighting its inherent advantages. We then introduce BoostTM, an enhanced version of LockillerTM designed for mainstream CMP systems. BoostTM incorporates design improvements to address the identified challenges and introduces a core exception handling mechanism to fill the gap left by LockillerTM in alleviating non-conflict-induced abort events. Finally, we extend the gem5 infrastructure to validate and evaluate BoostTM on a newly configured experimental platform with 32 out-of-order cores and distributed LLCs. Our evaluation demonstrates that BoostTM outperforms best-effort HTM, LockillerTM, and recent works—LosaTM-SAFU and CIT—with minimal overhead, providing a comprehensive understanding of the effectiveness and adaptability of each mechanism.
BoostTM:在分布式多核体系结构的硬件事务内存中保证最佳性能
多线程编程中对共享数据的并发访问仍然是芯片多处理器(CMP)系统的性能瓶颈。尽力而为硬件事务性内存(Best-effort Hardware Transactional Memory, HTM)提供了一种潜在的解决方案,但它面临着一些关键的限制:由于请求者获胜的冲突策略导致频繁的活动锁,无法与非推测性的回退路径共存,以及容易受到非冲突引发的中止事件的影响,比如缓存溢出和核心异常。主流CMP平台通常具有乱序内核和分布式最后一级缓存(llc),这给HTM优化带来了额外的挑战。本文首先形式化了这些约束,并对我们之前的工作LockillerTM进行了理论性能分析,突出了其固有的优势。然后介绍BoostTM,这是LockillerTM的增强版本,专为主流CMP系统设计。BoostTM整合了设计改进,以解决已确定的挑战,并引入了核心异常处理机制,以填补LockillerTM在缓解非冲突引起的中断事件方面留下的空白。最后,我们扩展了gem5基础架构,在一个新配置的实验平台上验证和评估BoostTM,该平台具有32个乱序内核和分布式llc。我们的评估表明,BoostTM以最小的开销胜过了尽力而为的HTM、LockillerTM和最近的工作——losatm - safu和cit——提供了对每种机制的有效性和适应性的全面理解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Systems Architecture
Journal of Systems Architecture 工程技术-计算机:硬件
CiteScore
8.70
自引率
15.60%
发文量
226
审稿时长
46 days
期刊介绍: The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software. Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信